From: Arlin S. <ar...@um...> - 2011-06-06 14:48:37
|
On Jun 6, 2011, at 10:24 AM, Rutger Vos wrote: > (3) We must always be well-grounded in the ways in which biologists > actually work, not just how we would like them to work -- the software > they use, the work flows that they use, etc. We know that in their > analysis phase, they use codes and abbreviations for their taxon > labels. . . . > (4) The MIIDI minimum metadata editor > (http://www.miidi.org:8080/orbeon/miidi-review/report?id=14) is > totally cool . . . The problem is > there is no way in hell that biologists will invest the time in this: > can you imagine taking a 1,000-taxon tree, and for each 1,000 OTUs you > have to click a set of nested boxes to enter the Genbank taxID number, I agree with the thinking here-- IMHO our proposal will fare better if we focus on solving user problems (in sexy ways, of course). The main problem is that users need to archive (to comply with policies) but the crap that they are poised to archive is not re-usable. (TreeGrabber exists because most authors publish and archive pictures of trees rather than logically encoded trees). Archiving is going to happen, because it's being pushed by policies, but this won't have a huge impact on re-use until we make it easy for users to submit re- useable data. To break this down into manageable chunks, the biggest problems that I see are 1) most users need to translate their data into formats better suited to archiving; 2) the OTU names don't match within the user's own files; 3) the data objects referenced in the files do not have GUIDs or accessions that can be machine-processed; 4) the record does not have sufficient metadata annotations for potential re-users to judge accurately the prospects for re-use. The TreeBASE submission process doesn't help with #1, although Mesquite actually can help users load up their data from other formats into NEXUS. The TB submission process exposes problem #2 but doesn't help the user to fix it. However, matching N things with N other things is a classic problem in comp sci called "the marriage problem". There are many solutions. We just need to implement one and allow the user to accept or edit the suggested matching in a nice graphical way. If users have sequences, we can BLAST them and get both a suggested accession and a suggested species identifier. That solves #3 for molecular users. Support for #4 is already part of what the MIAPA people are proposing. Arlin ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |