Menu

#56 Taxon validation uses wrong matching algorithm

piel
open
None
4
2009-03-30
2009-03-25
No

BP says:

> I did a taxon validation thing and got the attached results at the first
> pass.
>
> 3- When I follow one of the "validate by hand" links, I see a list
> like this
> <http://8ball.sdsc.edu:6666/treebase-web/user/editTaxonLabel.html?taxonlabelid=237178> one
> for Actinomucor elegans: but it list three identical items to choose
> from, each having identical ncbi_taxids. We should never really see a
> list of multiple hits each with the same ncbi_taxid, but we can see
> multiple hits on the same ubio_namebankid.
>

[ Bill's output page is attached below, and I reported the triple display of Actinomucor elegans as bug 2712234. however, Bill goes on to say: ]

> ... so on the face if it, it's just a display bug, and that there is
> probably an easy fix to make each of the three options show a distinct
> taxon name and ncbi_taxid. However, I don't think this should be
> happening either. The label "Actinomucor elegans" ought to only match
> with one of the three. Is this happening because we have implemented a
> wildcard search? (i.e. LIKE 'Actinomucor elegans%'). If so, that
> doesn't seem right to me -- we should be doing something like this:
>
> 1- remove any suffixes from the taxon_label that don't look like they
> are part of the name string (i.e. remove suffixes that contain numbers
> or that have upper case letters or that have a very short length).
>
> 2- take the what remains in the taxon_label and try to match it against
> the taxon_variant fullnamestring. If it hits, then count the number of
> related taxa. If there is more than one, then make the "match by hand"
> warning and list the multiple names and ncbi_taxids to choose from.
>
> 3- If you don't get a taxon_variant fullnamestring match, then SOAP over
> to uBIO and see if it's there & if so try to collect namebank and
> ncbi_taxids.
>
> Notice that this does not use any wildcard searching of the
> fullnamestring -- so in general "Actinomucor elegans" should not find a
> match with "Actinomucor elegans var. elegans" (etc).
>

Discussion

  • Mark Dominus

    Mark Dominus - 2009-03-25
     
  • Rutger Vos

    Rutger Vos - 2009-03-26
    • assigned_to: nobody --> rvos
     
  • Mark Dominus

    Mark Dominus - 2009-03-30

    Bill said this part can be done post-beta.

    It is the ultimate cause of #2712234, which is more urgent.

     
  • Mark Dominus

    Mark Dominus - 2009-03-30
    • priority: 5 --> 4
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.