From: Hilmar L. <hl...@ne...> - 2010-11-04 20:59:23
|
Smells like an abuse of the keyword field in TreeBASE to me, and should therefore be fixed in TreeBASE. Bill? -hilmar On Nov 4, 2010, at 2:14 PM, Vladimir Gapeyev wrote: > [I am cc'ing this thread to treebase-devel, per Hilmar's request] > > I will add to OAI record creation a keyword tokenizer that uses "," > and ";" as delimiters. > > Here is another issue: Lots of entries in Treebase contain "in > press" as the keyword, which Kevin's code on the Dryad side does not > accept as such. Should this elimination of "in press" remain on the > Dryad side? I.e., is "in press" a valid keyword from Treebase point > of view? If not, should this be fixed by erasing "in press" values > from the DB or by filtering them out of OAI records sent out? > > --Vladimir > > > On Nov 3, 2010, at 5:10 PM, William Piel wrote: > >> >> The solution of splitting the string using either commas or semi- >> colons looks fine to me. (and why should it be a problem if the >> string has a mix of the two? Seems to me that it should work by >> spitting on either.) It's not impossible that some authors will use >> other non-standard delimiters, such as the long hyphen (" — ") the >> double hyphen ("--") and the bullet " • " but there's only so many >> options for us to accommodate. >> >> bp >> >> >> >> On Nov 3, 2010, at 4:58 PM, Hilmar Lapp wrote: >> >>> I think Bill Piel will need to at least chime in here, and >>> possibly others. Would you mind posting this to the treebase-devel >>> list? >>> >>> -hilmar >>> >>> On Nov 3, 2010, at 2:56 PM, Vladimir Gapeyev wrote: >>> >>>> I've put this request for clarification in SF tracker, please >>>> advise. --Vladimir >>>> >>>> Begin forwarded message: >>>> >>>>> From: "SourceForge.net" <no...@so...> >>>>> Date: November 3, 2010 2:49:37 PM EDT >>>>> To: no...@so... >>>>> Subject: [Treebase-guts] [ treebase-Bugs-3079602 ] OAI records >>>>> contain all subjects in a single field >>>>> >>>>> >>>>> Initial Comment: >>>>> In the OAI records, each <dc:subject> field contains many >>>>> keywords, separated by commas, like this: >>>>> >>>>> <dc:subject> >>>>> Ascomycota, Pezizomycotina, Dothideomyceta, fungal evolution, >>>>> lichens, multigene phylogeny, phylogenomics, plant pathogens, >>>>> saprobes, Tree of Life >>>>> </dc:subject> >>>>> >>>>> It is best practice to put each keyword into a separate >>>>> <dc:subject> field. This allows harvesting systems (like Dryad) >>>>> to accurately separate the keywords, and not worry about >>>>> keywords that may contain commas. >>>>> >>>>> ---------------------------------------------------------------------- >>>>> >>>>>> Comment By: Vladimir Gapeyev (vgapeyev) >>>>> Date: 2010-11-03 14:49 >>>>> >>>>> Message: >>>>> This is a request for clarification. >>>>> >>>>> Treebase UI offers a single field to enter keywords, text from >>>>> which is >>>>> stored in a single field in the database. From the data in >>>>> treebase-dev I >>>>> see that users used ',' or ';' to separate multiple keywords. >>>>> >>>>> Here is what I can do: Get Kevin's keyword-splitting code and >>>>> place it on >>>>> Treebase side, modifying if necessary to work with both ';' and >>>>> ','. This >>>>> would not work nicely if the user has a fancy to use comma- >>>>> containing >>>>> keywords separated by semicolons, or the other way around. >>>>> >>>>> Please confirm that this is what is needed. >> >> >> >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |