From: Arlin S. <ar...@um...> - 2011-06-06 17:08:52
|
On Jun 6, 2011, at 12:44 PM, William Piel wrote: > Just some minor commentary: > > - I've written scripts that take Genbank accessions numbers, extract > metadata out of Genbank, and format it ready for ingest by TreeBASE > -- but I'm surprised at the number of times that people submit > alignments containing sequences that are still embargoed by Genbank. > (arg...). A lot of people just pick the default one-year embargo > period, not knowing how long it will take for their article to get > through the publishing system. So at the time of submitting to > TreeBASE, we can't take advantage of any automatic cross-walking > with Genbank. "any" only applies to the case of those newly determined sequences still subject to embargo, right? In other cases, sequences used in alignments are not embargoed, because they were published already, or because the author's embargo has expired. Do you know what fraction of cases are embargoed? Can TreeBASE periodically search post- submission to discover GenBank matches for any of its undocumented sequences (there would have to be some way to query the author to approve, I suppose)? > - Unfortunately, BLAST frequently doesn't work in that it often > produces false positives. At best, we should use BLAST to *assist* > the submitter in preparing metadata, but human eyes have to > supervise this process. This also assumes that Genbank is richly > annotated, and unfortunately that's not true. For example, in a > sample of 21,736 records in Genbank that are found in TreeBASE, only > 373 of them were tagged with lat/long metadata. :-( I agree. Automated methods to fill in the metadata blanks should be treated as suggestions, subject to the user's final approval. > taken together, this weakens the statement that this "solves #3 for > molecular users" OK, I agree that it is weakened, but while we are waiting for all the other problems in the world to be solved so that we can achieve metadata perfection, does this approach at least solve 80 % of problem #3 for molecular users? Or do you think it is a much smaller fraction than that? Arlin ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |