From: Hilmar L. <hl...@ne...> - 2012-02-21 17:24:59
|
Right. So ideally people can type in "homo sapiens" (w/o quotes) just as well as two identifiers separated by whitespace, and the system does the Right Thing in each case. You could see if your first two tokens are alphanumeric, and then try the whole thing as a (possibly qualified) species name. If you get zero results, you try them white-space tokenized and OR'ed. -hilmar On Feb 21, 2012, at 12:18 PM, Rutger Vos wrote: >> The only thing that people may not expect is the tokenization based on comma rather than whitespace, because it's different from what Google does (which is what everyone is used to). > > Except it isn't precisely what google does, because it weighs > occurrences of words near each other higher, so that typing 'Homo > sapiens' (no quotes) won't just return a bucket of pages that have the > words 'Homo' and 'sapiens' anywhere in them. And maybe we don't want > people to have to write 'Homo sapiens' (with quotes) every time they > search for a species - which will be always once they find out we > don't do higher taxa. > >> On Feb 21, 2012, at 11:57 AM, Rutger Vos wrote: >> >>> When it encounters an integer, it assumes it could be any of these predicates: >>> >>> - 'tb.identifier.ncbi', >>> - 'tb.identifier.ubio', >>> - 'tb.identifier.taxon', >>> - 'tb.identifier.taxon.tb1' >>> >>> For things that look like TreeBASE id's (e.g. /^[A-Z][a-z]*\d+$/): >>> >>> - 'tb.identifier.taxon', >>> - 'tb.identifier.taxon.tb1' >>> >>> Words: >>> >>> - 'tb.title.taxon', >>> - 'tb.title.taxonLabel', >>> - 'tb.title.taxonVariant' >>> >>> What users type in the search box is split on commas. In an earlier >>> iteration I made it split on (quoted) words/white space - but that >>> messes up what is probably the 80% use case of people entering "Genus >>> species" (without quotes). >>> >>> If you now type "Homo sapiens, 9606" (without quotes), you will get >>> hits that have 9606 as any identifier, and hits for Homo sapiens, >>> including Homo sapiens neanderthalensis. By clicking on the advanced >>> search button, users might be able to figure out (and influence) what >>> exactly is searched on. Maybe we could have a screen cast that >>> explains the search logic? >>> >>> On Tue, Feb 21, 2012 at 5:30 PM, Hilmar Lapp <hl...@ne...> wrote: >>>> Nice! I didn't try it extensively. What does it recognize and/or >>>> specifically search? >>>> >>>> -hilmar >>>> >>>> On Feb 21, 2012, at 6:30 AM, Rutger Vos wrote: >>>> >>>> Hi all, >>>> >>>> what do you think of this new, simplified search >>>> box: http://treebase-dev.nescent.org/treebase-web/search/taxonSearch.html >>>> >>>> I've done most of the prep work to have the same box for matrix, tree and >>>> study searching so that end users just type their search terms and we try to >>>> be clever in expanding those into the context-appropriate search predicates. >>>> >>>> Rutger >>>> >>>> -- >>>> Dr. Rutger A. Vos >>>> Bioinformaticist >>>> NCB Naturalis >>>> Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the >>>> Netherlands >>>> Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands >>>> http://rutgervos.blogspot.com >>>> ------------------------------------------------------------------------------ >>>> Keep Your Developer Skills Current with LearnDevNow! >>>> The most comprehensive online learning library for Microsoft developers >>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >>>> Metro Style Apps, more. Free future releases when you subscribe now! >>>> http://p.sf.net/sfu/learndevnow-d2d_______________________________________________ >>>> Treebase-devel mailing list >>>> Tre...@li... >>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>> >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : >>>> =========================================================== >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Dr. Rutger A. Vos >>> Bioinformaticist >>> NCB Naturalis >>> Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the Netherlands >>> Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands >>> http://rutgervos.blogspot.com >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : >> =========================================================== >> >> >> > > > > -- > Dr. Rutger A. Vos > Bioinformaticist > NCB Naturalis > Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the Netherlands > Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands > http://rutgervos.blogspot.com -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |