From: SourceForge.net <no...@so...> - 2009-06-19 16:59:41
|
Bugs item #2809146, was opened at 2009-06-19 12:59 Message generated for change (Tracker Item Submitted) made by sfrgpiel You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126676&aid=2809146&group_id=248804 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: data Group: None Status: Open Priority: 7 Private: No Submitted By: William Piel (sfrgpiel) Assigned to: Mark Dominus (mjdominus) Summary: Taxon validation fails with trinomials plus suffix Initial Comment: In submission S2440, you will notice that all binomial taxon labels with suffix codes (e.g. "Arianta schmidtii EG71B") validate automatically, but trinomial taxon labels with suffix codes (e.g. "Arianta arbustorum styriaca EG468") fail to validate (except where I have entered uBio IDs manually, such as with "Arianta arbustorum styriaca AT EG454"). I'm not sure where the problem lies (and, in fact, most if not all of these trinomials are already in TreeBASE, so the problem happens "in house" prior to using uBio's web services), but I would suggest that the solution is run a series of regular expressions on each taxon label. e.g. (1) first make sure that there is a space between species or subspecies names and suffix codes, assuming that a lower case followed by an upper case or a number probably indicates the presence of a suffix code stuck to the end of a species or subspecies name -- i.e. s/([a-z]{3,})([A-Z\d+]+)/\1 \2/ Then (2), test to see if there is a trinomial followed by a possible suffix, realizing that hyphens are allowed in species and subspecies names: m/^([A-Z][a-z]+) ([a-z\-]+) ([a-z\-]+)(.*)$/ -- if you get a hit, search the taxon_variants table for "$1 $2 $3" and if nothing there, throw "$1 $2 $3" against uBio's web services. If no hit, then (3) test to see if there is a binomial followed by a possible suffix: m/^([A-Z][a-z]+) ([a-z\-]+)(.*)$/ and if you get a hit, search the taxon_variants table for "$1 $2" and if nothing there, throw "$1 $2" at uBio's web services. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126676&aid=2809146&group_id=248804 |