From: Hilmar L. <hl...@du...> - 2009-04-07 06:24:37
|
At the risk of starting a fire, a few thoughts from me below. On Apr 7, 2009, at 1:15 AM, Rutger Vos wrote: > [...] when users submit any nexus file containing taxon labels (a > tree file, a > matrix file, or a combination thereof), TreeBASE initially flags these > labels as not having been linked to an external taxonomy (i.e. uBio > and NCBI). The user is then expected to go the page with taxon labels, > check all labels flagged as not linked (they have a red cross mark > next to them) and hit "validate". I.e., if the user never goes to that page (or goes to the page but never takes any action), the taxon labels will never be validated? In other words, whether TreeBASE has taxon name service-validated taxon labels for a record is in the hands of the user submitting the record? > TreeBASE will then attempt to look up these labels in uBio - and from > there in the NCBI taxonomy. In some cases, it will turn out that a > label is a true homonym, i.e. multiple, actual taxa by that name exist > - usually a plant taxon and an animal taxon (examples: Aotus, > Abronia). TreeBASE will pick the first option of the list of homonyms > and warn the user that it did this, urging the user to check by hand > whether that choice is the right one. So in case of homonyms, TreeBASE will basically pick one at random, and leave it to the user to correct it where it picks wrong? I.e., if the user doesn't make the corrections, TreeBASE will have incorrect data after this step 50% or more of the time? > [...] I have now implemented the following: a map (keys: taxon label > IDs, values: taxon label strings) called "homonyms" is created > during taxon label validation and carried around the session. Every > time a homonym is manually resolved, that > entry is deleted from the session. That doesn't solve the above, does it? Specifically, because of: > [...] These messages do NOT persist between sessions is the fact that some of the taxon labels are now wrong lost if the user chooses not to complete the homonym corrections? I don't know enough details, but if this is true, we can probably agree that 1) TreeBASE shouldn't make data incorrect by the submission or curation UI, let alone store data that it made possibly incorrect, and 2) if TreeBASE alters the user's data in non-trivial ways (i.e., other than changing formatting etc), then those changes should be traceable and the original values retained, so that they can be reviewed and corrected given better knowledge. (For example, a synonym assignment in uBio might turn out wrong a month later.) > [...] The only way to fix that would be to add a column to the taxon > labels table that can persistently flag labels as unresolved > homonyms. I personally think > that that would be excessive, so I have closed the bug report. Agreed? So, three thoughts here for TreeBASE's (as a project community) consideration. 1) An issue is open until it is either fixed, or the stakeholders decide to reject the issue for whatever reason. One of the reasons for rejection might be excessive effort needed to fix the issue, but developers shouldn't make the decision. (As a developer, you really don't want to spend your time debating with users whether a certain amount of effort is excessive or not.) Who are the TreeBASE stakeholders for the purposes of development prioritization and issue rejection? 2) Each feature needs clear and precise requirements. A bug normally indicates that a certain requirement of the system is not met. A bug is not fixed until the requirements violations being reported aren't fixed. If the requirements of a feature aren't spelled out clearly to begin with, it becomes difficult later to determine whether an issue reported with the feature is really a new feature request or a requirements violation. (An issue report that really is a new feature request is still a valid issue report. However, it is often prioritized differently than fixing an incorrectly implemented feature whose requirements have previously been agreed upon.) 3) Stakeholders sign off on requirements. (They may be suggested by, but are not unilaterally declared by developers.) In this case, if the requirements of the taxon label validation against a taxon name service feature have been agreed upon to include that the "resolved" taxon label being stored not be wrong due to homonymy, than anything short of guaranteeing that is an incorrect implementation that needs to be fixed. So what are the requirements for that feature? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |