From: SourceForge.net <no...@so...> - 2010-11-18 16:20:21
|
Bugs item #3079602, was opened at 2010-10-01 15:13 Message generated for change (Comment added) made by vgapeyev You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126676&aid=3079602&group_id=248804 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: APIs Group: None >Status: Closed Priority: 7 Private: No Submitted By: Ryan Scherle (rscherle) Assigned to: Vladimir Gapeyev (vgapeyev) Summary: OAI records contain all subjects in a single field Initial Comment: In the OAI records, each <dc:subject> field contains many keywords, separated by commas, like this: <dc:subject> Ascomycota, Pezizomycotina, Dothideomyceta, fungal evolution, lichens, multigene phylogeny, phylogenomics, plant pathogens, saprobes, Tree of Life </dc:subject> It is best practice to put each keyword into a separate <dc:subject> field. This allows harvesting systems (like Dryad) to accurately separate the keywords, and not worry about keywords that may contain commas. ---------------------------------------------------------------------- >Comment By: Vladimir Gapeyev (vgapeyev) Date: 2010-11-18 11:20 Message: Fixed in SVN 760: Treebase citation.keyword field is now split on both ',' and ';', with the results going into separate <dc:subject> elements. 'in press' values will show up as <dc:subject>in press</dc:subject> -- this is awaiting Bill's data cleaning on production. ---------------------------------------------------------------------- Comment By: Vladimir Gapeyev (vgapeyev) Date: 2010-11-18 11:20 Message: Your bug has been resolved. Thanks for the report. ---------------------------------------------------------------------- Comment By: Kevin S. Clarke (ksclarke) Date: 2010-11-03 15:13 Message: This was my concern as well with my workaround Dryad code -- that there may be repositories for whom the comma is significant and not a delimiter. It seems that if TreeBASE wants to store all these in one field it might be good to prescribe that users use a semicolon as a delimiter (perhaps doing a db cleanup on records that are currently using a comma). Then the OAI code could rely on the semicolon as the split to break the string into separate metadata elements for output via OAI-PMH. My code was very minimal for this just using a StringTokenizer(value, ";,") cf. line 785 in http://code.google.com/p/dryad/source/browse/trunk/dryad/dspace/modules/api/src/main/java/org/dspace/harvest/OAIHarvester.java ---------------------------------------------------------------------- Comment By: Vladimir Gapeyev (vgapeyev) Date: 2010-11-03 14:49 Message: This is a request for clarification. Treebase UI offers a single field to enter keywords, text from which is stored in a single field in the database. From the data in treebase-dev I see that users used ',' or ';' to separate multiple keywords. Here is what I can do: Get Kevin's keyword-splitting code and place it on Treebase side, modifying if necessary to work with both ';' and ','. This would not work nicely if the user has a fancy to use comma-containing keywords separated by semicolons, or the other way around. Please confirm that this is what is needed. ---------------------------------------------------------------------- Comment By: Vladimir Gapeyev (vgapeyev) Date: 2010-11-03 14:49 Message: Thanks for reporting this bug. We'll look into it as soon as possible. ---------------------------------------------------------------------- Comment By: Vladimir Gapeyev (vgapeyev) Date: 2010-11-03 14:31 Message: A few URLs that return records exhibiting the problem: http://127.0.0.1:8080/treebase-web/top/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=TB:s1908 http://127.0.0.1:8080/treebase-web/top/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=TB:s10013 http://127.0.0.1:8080/treebase-web/top/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=TB:s1122 http://127.0.0.1:8080/treebase-web/top/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=TB:s994 Note that some separate keywords with ',' while others with ';' ---------------------------------------------------------------------- Comment By: Vladimir Gapeyev (vgapeyev) Date: 2010-10-20 14:33 Message: See https://www.nescent.org/wg_dryad/TreeBASE_OAI_Provider for examples of URLs that return these OAI records. The record schema is at http://datadryad.org/profile/v3/dryad.xsd Both data formats mentioned above formally conform to the schema, but the best practice is to have several <dc:subject> elements, one per term. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126676&aid=3079602&group_id=248804 |