From: Karen C. <kar...@ne...> - 2011-05-26 19:00:20
|
There has been some interest among various groups in an ABI proposal for development of phyloinformatics resources. This email is an attempt to connect those threads and move the process forward. The conversations that have been happening up to this point are: 1. The Phyloinformatics Research Foundation (phylofoundation.org, stewards of TreeBASE and ToLWeb) started a Google doc aimed at TreeBASE 2. MIAPA developers started a wiki page (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the need for coordination with TreeBASE and other resources 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE host and as a third party interested in coordinated development across resources started a third document (now added to the already mentioned Google doc) If you are interested in this discussion and do not already have access to the Google doc entitled TreeBASE_ABI.doc, let me know and I can grant you access. Hilmar and I made some substantial edits earlier this morning. I point you specifically to the section at the end entitled "An attempt to re-think all of this". Briefly, we wanted to encourage some radical thinking and explore the idea of developing a PhyloCommons that incorporates both TreeBASE and ToLWeb into the proposal (as the data repository and the data sharing / dissemination / synthesis platform, respectively). The ABI deadline is July 7, so we have a short period of time to pull this together. Here is a link to a Doodle poll for an initial teleconference. http://doodle.com/zf2tz7sftyk3naxy During this meeting, we hope to come to agreement on the broad direction of the grant, identify possible leaders of the various components and create a plan for getting this pulled together in time for the deadline. Please feel free to continue the conversation on the Google doc between now and the teleconference. If there are others who you think should be invited, feel free to do so. Not everyone who participates in this first phase will end up being named on the grant, but these resources require input from a much larger group. Cheers, Karen -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karen Cranston Training Coordinator and Informatics Project Manager nescent.org ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Rutger V. <R....@re...> - 2011-05-27 12:47:06
|
I'd like to second this initiative, I think the blurb at the end of the google doc nicely puts the finger on the sore spots, e.g.: "TreeBASE arguably falls into the latter category, with ingest and retrieval of data constituting the predominant uses, yet the design of its current version chiefly aims at a fully transactional database for frequent and concurrent updates. This legacy design choice places serious limitations on extensibility and robustness of the software, and has resulted in a degree of complexity of the code and deployment procedure that threatens its long-term sustainability." The first half, i.e. "Ideas for New Development" has a lot of good ideas in it, hopefully we can make it so that some of those ideas are the meat on the bones of the sentiment expressed in the second half. Let's fill out the doodle poll, its window is for next week. Rutger On Thu, May 26, 2011 at 8:00 PM, Karen Cranston <kar...@ne...> wrote: > There has been some interest among various groups in an ABI proposal > for development of phyloinformatics resources. This email is an > attempt to connect those threads and move the process forward. The > conversations that have been happening up to this point are: > > 1. The Phyloinformatics Research Foundation (phylofoundation.org, > stewards of TreeBASE and ToLWeb) started a Google doc aimed at > TreeBASE > 2. MIAPA developers started a wiki page > (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the > need for coordination with TreeBASE and other resources > 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE host and > as a third party interested in coordinated development across > resources started a third document (now added to the already mentioned > Google doc) > > If you are interested in this discussion and do not already have > access to the Google doc entitled TreeBASE_ABI.doc, let me know and I > can grant you access. Hilmar and I made some substantial edits earlier > this morning. I point you specifically to the section at the end > entitled "An attempt to re-think all of this". Briefly, we wanted to > encourage some radical thinking and explore the idea of developing a > PhyloCommons that incorporates both TreeBASE and ToLWeb into the > proposal (as the data repository and the data sharing / dissemination > / synthesis platform, respectively). > > The ABI deadline is July 7, so we have a short period of time to pull > this together. Here is a link to a Doodle poll for an initial > teleconference. > > http://doodle.com/zf2tz7sftyk3naxy > > During this meeting, we hope to come to agreement on the broad > direction of the grant, identify possible leaders of the various > components and create a plan for getting this pulled together in time > for the deadline. Please feel free to continue the conversation on the > Google doc between now and the teleconference. If there are others who > you think should be invited, feel free to do so. Not everyone who > participates in this first phase will end up being named on the grant, > but these resources require input from a much larger group. > > Cheers, > Karen > > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Karen Cranston > Training Coordinator and Informatics Project Manager > nescent.org > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -- > You received this message because you are subscribed to the Google > Groups "MIAPA" group. > For more options, visit this group at > http://groups.google.com/group/miapa-discuss?hl=en > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |
From: Vision, T. J <tj...@bi...> - 2011-05-27 16:05:35
|
It is great to see these threads coming together! Sadly, I can't be directly involved because I will be submitting a proposal to ABI in the same round (for Dryad). But that does not disqualify other NESCentians from participating. One word of caution. The document as it stands now harps a bit too much on the problems with the current system. As it gets transformed into a proposal, I would hope that the solutions take center stage, and that it emphasizes the potential for what lies ahead, rather than the limitations of what lied behind. OK, a second word of caution. I would not take entirely at face value NSF's protestations that scientific goals are not important to the success of ABI Development proposals. It may not be as important, but it stll matter. In particular, I think there's an opportunity here to really see what can be achieved by looking synthetically at a large body of phylogenetic trees. How much incongruence is there? Does incongruence descrease over time? How is it affected by the type of study (morphological vs molecular, amount of data, extent of taxonmic sampling, parsimony vs probabilistic, etc)? Where are the holes in phylogenetic knowledge? In which groups is there enough overlap to build supertrees or grafted trees? These kinds of questions can only be answered with a resource like TB, and emphasizing such synthetic outcomes would help motivate the whole endeavour. cheers, Todd On May 27, 2011, at 8:46 AM, Rutger Vos wrote: > I'd like to second this initiative, I think the blurb at the end of > the google doc nicely puts the finger on the sore spots, e.g.: > > "TreeBASE arguably falls into the latter category, with ingest and > retrieval of data constituting the predominant uses, yet the design of > its current version chiefly aims at a fully transactional database for > frequent and concurrent updates. This legacy design choice places > serious limitations on extensibility and robustness of the software, > and has resulted in a degree of complexity of the code and deployment > procedure that threatens its long-term sustainability." > > The first half, i.e. "Ideas for New Development" has a lot of good > ideas in it, hopefully we can make it so that some of those ideas are > the meat on the bones of the sentiment expressed in the second half. > > Let's fill out the doodle poll, its window is for next week. > > Rutger > > On Thu, May 26, 2011 at 8:00 PM, Karen Cranston > <kar...@ne...> wrote: >> There has been some interest among various groups in an ABI proposal >> for development of phyloinformatics resources. This email is an >> attempt to connect those threads and move the process forward. The >> conversations that have been happening up to this point are: >> >> 1. The Phyloinformatics Research Foundation (phylofoundation.org, >> stewards of TreeBASE and ToLWeb) started a Google doc aimed at >> TreeBASE >> 2. MIAPA developers started a wiki page >> (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the >> need for coordination with TreeBASE and other resources >> 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE host and >> as a third party interested in coordinated development across >> resources started a third document (now added to the already mentioned >> Google doc) >> >> If you are interested in this discussion and do not already have >> access to the Google doc entitled TreeBASE_ABI.doc, let me know and I >> can grant you access. Hilmar and I made some substantial edits earlier >> this morning. I point you specifically to the section at the end >> entitled "An attempt to re-think all of this". Briefly, we wanted to >> encourage some radical thinking and explore the idea of developing a >> PhyloCommons that incorporates both TreeBASE and ToLWeb into the >> proposal (as the data repository and the data sharing / dissemination >> / synthesis platform, respectively). >> >> The ABI deadline is July 7, so we have a short period of time to pull >> this together. Here is a link to a Doodle poll for an initial >> teleconference. >> >> http://doodle.com/zf2tz7sftyk3naxy >> >> During this meeting, we hope to come to agreement on the broad >> direction of the grant, identify possible leaders of the various >> components and create a plan for getting this pulled together in time >> for the deadline. Please feel free to continue the conversation on the >> Google doc between now and the teleconference. If there are others who >> you think should be invited, feel free to do so. Not everyone who >> participates in this first phase will end up being named on the grant, >> but these resources require input from a much larger group. >> >> Cheers, >> Karen >> >> >> -- >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Karen Cranston >> Training Coordinator and Informatics Project Manager >> nescent.org >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> -- >> You received this message because you are subscribed to the Google >> Groups "MIAPA" group. >> For more options, visit this group at >> http://groups.google.com/group/miapa-discuss?hl=en >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading, RG6 6BX, United Kingdom > Tel: +44 (0) 118 378 7535 > http://rutgervos.blogspot.com > > -- > You received this message because you are subscribed to the Google > Groups "MIAPA" group. > For more options, visit this group at > http://groups.google.com/group/miapa-discuss?hl=en |
From: Karen C. <kar...@ne...> - 2011-05-31 16:20:10
|
Tomorrow morning (Wed, June 1) looks to be good for everyone, and sooner seems better than later. I propose we talk at 9:00 am EST. I will send connection information later today. Cheers, Karen On Thu, May 26, 2011 at 3:00 PM, Karen Cranston <kar...@ne...> wrote: > There has been some interest among various groups in an ABI proposal > for development of phyloinformatics resources. This email is an > attempt to connect those threads and move the process forward. The > conversations that have been happening up to this point are: > > 1. The Phyloinformatics Research Foundation (phylofoundation.org, > stewards of TreeBASE and ToLWeb) started a Google doc aimed at > TreeBASE > 2. MIAPA developers started a wiki page > (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the > need for coordination with TreeBASE and other resources > 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE host and > as a third party interested in coordinated development across > resources started a third document (now added to the already mentioned > Google doc) > > If you are interested in this discussion and do not already have > access to the Google doc entitled TreeBASE_ABI.doc, let me know and I > can grant you access. Hilmar and I made some substantial edits earlier > this morning. I point you specifically to the section at the end > entitled "An attempt to re-think all of this". Briefly, we wanted to > encourage some radical thinking and explore the idea of developing a > PhyloCommons that incorporates both TreeBASE and ToLWeb into the > proposal (as the data repository and the data sharing / dissemination > / synthesis platform, respectively). > > The ABI deadline is July 7, so we have a short period of time to pull > this together. Here is a link to a Doodle poll for an initial > teleconference. > > http://doodle.com/zf2tz7sftyk3naxy > > During this meeting, we hope to come to agreement on the broad > direction of the grant, identify possible leaders of the various > components and create a plan for getting this pulled together in time > for the deadline. Please feel free to continue the conversation on the > Google doc between now and the teleconference. If there are others who > you think should be invited, feel free to do so. Not everyone who > participates in this first phase will end up being named on the grant, > but these resources require input from a much larger group. > > Cheers, > Karen > > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Karen Cranston > Training Coordinator and Informatics Project Manager > nescent.org > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karen Cranston Training Coordinator and Informatics Project Manager nescent.org ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Karen C. <kar...@ne...> - 2011-06-01 12:35:46
|
The connection information for the call this morning is in the Google Doc. Let me know if you do not have access. On Tue, May 31, 2011 at 12:19 PM, Karen Cranston <kar...@ne...> wrote: > Tomorrow morning (Wed, June 1) looks to be good for everyone, and > sooner seems better than later. I propose we talk at 9:00 am EST. I > will send connection information later today. > > Cheers, > Karen > > On Thu, May 26, 2011 at 3:00 PM, Karen Cranston > <kar...@ne...> wrote: >> There has been some interest among various groups in an ABI proposal >> for development of phyloinformatics resources. This email is an >> attempt to connect those threads and move the process forward. The >> conversations that have been happening up to this point are: >> >> 1. The Phyloinformatics Research Foundation (phylofoundation.org, >> stewards of TreeBASE and ToLWeb) started a Google doc aimed at >> TreeBASE >> 2. MIAPA developers started a wiki page >> (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the >> need for coordination with TreeBASE and other resources >> 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE host and >> as a third party interested in coordinated development across >> resources started a third document (now added to the already mentioned >> Google doc) >> >> If you are interested in this discussion and do not already have >> access to the Google doc entitled TreeBASE_ABI.doc, let me know and I >> can grant you access. Hilmar and I made some substantial edits earlier >> this morning. I point you specifically to the section at the end >> entitled "An attempt to re-think all of this". Briefly, we wanted to >> encourage some radical thinking and explore the idea of developing a >> PhyloCommons that incorporates both TreeBASE and ToLWeb into the >> proposal (as the data repository and the data sharing / dissemination >> / synthesis platform, respectively). >> >> The ABI deadline is July 7, so we have a short period of time to pull >> this together. Here is a link to a Doodle poll for an initial >> teleconference. >> >> http://doodle.com/zf2tz7sftyk3naxy >> >> During this meeting, we hope to come to agreement on the broad >> direction of the grant, identify possible leaders of the various >> components and create a plan for getting this pulled together in time >> for the deadline. Please feel free to continue the conversation on the >> Google doc between now and the teleconference. If there are others who >> you think should be invited, feel free to do so. Not everyone who >> participates in this first phase will end up being named on the grant, >> but these resources require input from a much larger group. >> >> Cheers, >> Karen >> >> >> -- >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Karen Cranston >> Training Coordinator and Informatics Project Manager >> nescent.org >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> > > > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Karen Cranston > Training Coordinator and Informatics Project Manager > nescent.org > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karen Cranston Training Coordinator and Informatics Project Manager nescent.org ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Westneat, M. <mwe...@fi...> - 2011-05-31 19:14:45
|
Hi all, I am definitely interested in this, but can't join in tomorrow (synthesis meeting on Cybertaxonomy this week). Will catch up and join the next meeting- Mark On Tue, May 31, 2011 at 11:19 AM, Karen Cranston <kar...@ne... > wrote: > Tomorrow morning (Wed, June 1) looks to be good for everyone, and > sooner seems better than later. I propose we talk at 9:00 am EST. I > will send connection information later today. > > Cheers, > Karen > > On Thu, May 26, 2011 at 3:00 PM, Karen Cranston > <kar...@ne...> wrote: > > There has been some interest among various groups in an ABI proposal > > for development of phyloinformatics resources. This email is an > > attempt to connect those threads and move the process forward. The > > conversations that have been happening up to this point are: > > > > 1. The Phyloinformatics Research Foundation (phylofoundation.org, > > stewards of TreeBASE and ToLWeb) started a Google doc aimed at > > TreeBASE > > 2. MIAPA developers started a wiki page > > (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the > > need for coordination with TreeBASE and other resources > > 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE host and > > as a third party interested in coordinated development across > > resources started a third document (now added to the already mentioned > > Google doc) > > > > If you are interested in this discussion and do not already have > > access to the Google doc entitled TreeBASE_ABI.doc, let me know and I > > can grant you access. Hilmar and I made some substantial edits earlier > > this morning. I point you specifically to the section at the end > > entitled "An attempt to re-think all of this". Briefly, we wanted to > > encourage some radical thinking and explore the idea of developing a > > PhyloCommons that incorporates both TreeBASE and ToLWeb into the > > proposal (as the data repository and the data sharing / dissemination > > / synthesis platform, respectively). > > > > The ABI deadline is July 7, so we have a short period of time to pull > > this together. Here is a link to a Doodle poll for an initial > > teleconference. > > > > http://doodle.com/zf2tz7sftyk3naxy > > > > During this meeting, we hope to come to agreement on the broad > > direction of the grant, identify possible leaders of the various > > components and create a plan for getting this pulled together in time > > for the deadline. Please feel free to continue the conversation on the > > Google doc between now and the teleconference. If there are others who > > you think should be invited, feel free to do so. Not everyone who > > participates in this first phase will end up being named on the grant, > > but these resources require input from a much larger group. > > > > Cheers, > > Karen > > > > > > -- > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Karen Cranston > > Training Coordinator and Informatics Project Manager > > nescent.org > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Karen Cranston > Training Coordinator and Informatics Project Manager > nescent.org > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > -- Mark W. Westneat Curator of Zoology Robert A. Pritzker Director, Biodiversity Synthesis Center of the Encyclopedia of Life Field Museum of Natural History 1400 S Lake Shore Dr., Chicago, IL 60605-2496 (312) 665-7734 My website: http://synthesis.eol.org/users/mwestneat<http://biosync.fieldmuseum.org/users/mwestneat> BioSynC website: http://synthesis.eol.org |
From: Arlin S. <ar...@um...> - 2011-06-03 19:39:02
|
Today is the deadline for our 1-page synopsis to pitch to an NSF program officer (before going further). Currently we seem to have 3 pitches. It is time now for some energetic person to consolidate this, so that we can move ahead. Arlin On May 31, 2011, at 12:19 PM, Karen Cranston wrote: > Tomorrow morning (Wed, June 1) looks to be good for everyone, and > sooner seems better than later. I propose we talk at 9:00 am EST. I > will send connection information later today. > > Cheers, > Karen > > On Thu, May 26, 2011 at 3:00 PM, Karen Cranston > <kar...@ne...> wrote: >> There has been some interest among various groups in an ABI proposal >> for development of phyloinformatics resources. This email is an >> attempt to connect those threads and move the process forward. The >> conversations that have been happening up to this point are: >> >> 1. The Phyloinformatics Research Foundation (phylofoundation.org, >> stewards of TreeBASE and ToLWeb) started a Google doc aimed at >> TreeBASE >> 2. MIAPA developers started a wiki page >> (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the >> need for coordination with TreeBASE and other resources >> 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE host >> and >> as a third party interested in coordinated development across >> resources started a third document (now added to the already >> mentioned >> Google doc) >> >> If you are interested in this discussion and do not already have >> access to the Google doc entitled TreeBASE_ABI.doc, let me know and I >> can grant you access. Hilmar and I made some substantial edits >> earlier >> this morning. I point you specifically to the section at the end >> entitled "An attempt to re-think all of this". Briefly, we wanted to >> encourage some radical thinking and explore the idea of developing a >> PhyloCommons that incorporates both TreeBASE and ToLWeb into the >> proposal (as the data repository and the data sharing / dissemination >> / synthesis platform, respectively). >> >> The ABI deadline is July 7, so we have a short period of time to pull >> this together. Here is a link to a Doodle poll for an initial >> teleconference. >> >> http://doodle.com/zf2tz7sftyk3naxy >> >> During this meeting, we hope to come to agreement on the broad >> direction of the grant, identify possible leaders of the various >> components and create a plan for getting this pulled together in time >> for the deadline. Please feel free to continue the conversation on >> the >> Google doc between now and the teleconference. If there are others >> who >> you think should be invited, feel free to do so. Not everyone who >> participates in this first phase will end up being named on the >> grant, >> but these resources require input from a much larger group. >> >> Cheers, >> Karen >> >> >> -- >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Karen Cranston >> Training Coordinator and Informatics Project Manager >> nescent.org >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> > > > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Karen Cranston > Training Coordinator and Informatics Project Manager > nescent.org > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -- > You received this message because you are subscribed to the Google > Groups "MIAPA" group. > For more options, visit this group at > http://groups.google.com/group/miapa-discuss?hl=en ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |
From: Karen C. <kar...@ne...> - 2011-06-06 14:04:56
|
There are several pitches now in the Google doc, with a fair bit of overlap between them. I am willing to consolidate into a single page and send to NSF (Reed?) and see what he has to say about the various components. It seems like these components are: 1. some level of re-engineering of TreeBASE 2. further development of MIAPA, with annotation tools and TreeBASE integration 3. use of ToLWeb as a crowd sourcing and data synthesis platform 4. NeXML refinement and development I don't think this one-pager needs to capture all of the ideas and details we currently have, but instead give a general sense of what we are proposing and if all / some of these ideas is potentially fundable. Everyone in agreement? I will post the single page in the doc later today. Karen On Fri, Jun 3, 2011 at 3:38 PM, Arlin Stoltzfus <ar...@um...> wrote: > Today is the deadline for our 1-page synopsis to pitch to an NSF program > officer (before going further). Currently we seem to have 3 pitches. It > is time now for some energetic person to consolidate this, so that we can > move ahead. > > Arlin > > On May 31, 2011, at 12:19 PM, Karen Cranston wrote: > >> Tomorrow morning (Wed, June 1) looks to be good for everyone, and >> sooner seems better than later. I propose we talk at 9:00 am EST. I >> will send connection information later today. >> >> Cheers, >> Karen >> >> On Thu, May 26, 2011 at 3:00 PM, Karen Cranston >> <kar...@ne...> wrote: >>> >>> There has been some interest among various groups in an ABI proposal >>> for development of phyloinformatics resources. This email is an >>> attempt to connect those threads and move the process forward. The >>> conversations that have been happening up to this point are: >>> >>> 1. The Phyloinformatics Research Foundation (phylofoundation.org, >>> stewards of TreeBASE and ToLWeb) started a Google doc aimed at >>> TreeBASE >>> 2. MIAPA developers started a wiki page >>> (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the >>> need for coordination with TreeBASE and other resources >>> 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE host and >>> as a third party interested in coordinated development across >>> resources started a third document (now added to the already mentioned >>> Google doc) >>> >>> If you are interested in this discussion and do not already have >>> access to the Google doc entitled TreeBASE_ABI.doc, let me know and I >>> can grant you access. Hilmar and I made some substantial edits earlier >>> this morning. I point you specifically to the section at the end >>> entitled "An attempt to re-think all of this". Briefly, we wanted to >>> encourage some radical thinking and explore the idea of developing a >>> PhyloCommons that incorporates both TreeBASE and ToLWeb into the >>> proposal (as the data repository and the data sharing / dissemination >>> / synthesis platform, respectively). >>> >>> The ABI deadline is July 7, so we have a short period of time to pull >>> this together. Here is a link to a Doodle poll for an initial >>> teleconference. >>> >>> http://doodle.com/zf2tz7sftyk3naxy >>> >>> During this meeting, we hope to come to agreement on the broad >>> direction of the grant, identify possible leaders of the various >>> components and create a plan for getting this pulled together in time >>> for the deadline. Please feel free to continue the conversation on the >>> Google doc between now and the teleconference. If there are others who >>> you think should be invited, feel free to do so. Not everyone who >>> participates in this first phase will end up being named on the grant, >>> but these resources require input from a much larger group. >>> >>> Cheers, >>> Karen >>> >>> >>> -- >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Karen Cranston >>> Training Coordinator and Informatics Project Manager >>> nescent.org >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >> >> >> >> -- >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Karen Cranston >> Training Coordinator and Informatics Project Manager >> nescent.org >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> -- >> You received this message because you are subscribed to the Google >> Groups "MIAPA" group. >> For more options, visit this group at >> http://groups.google.com/group/miapa-discuss?hl=en > > ------- > Arlin Stoltzfus (ar...@um...) > Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST > IBBR, 9600 Gudelsky Drive, Rockville, MD > tel: 240 314 6208; web: www.molevol.org > > -- > You received this message because you are subscribed to the Google > Groups "MIAPA" group. > For more options, visit this group at > http://groups.google.com/group/miapa-discuss?hl=en > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karen Cranston Training Coordinator and Informatics Project Manager nescent.org ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Arlin S. <ar...@um...> - 2011-06-06 14:16:42
|
Thanks, Karen. I'll take a look at it, too. Arlin On Jun 6, 2011, at 10:04 AM, Karen Cranston wrote: > There are several pitches now in the Google doc, with a fair bit of > overlap between them. I am willing to consolidate into a single page > and send to NSF (Reed?) and see what he has to say about the various > components. It seems like these components are: > 1. some level of re-engineering of TreeBASE > 2. further development of MIAPA, with annotation tools and TreeBASE > integration > 3. use of ToLWeb as a crowd sourcing and data synthesis platform > 4. NeXML refinement and development > > I don't think this one-pager needs to capture all of the ideas and > details we currently have, but instead give a general sense of what we > are proposing and if all / some of these ideas is potentially > fundable. > > Everyone in agreement? I will post the single page in the doc later > today. > > Karen > > On Fri, Jun 3, 2011 at 3:38 PM, Arlin Stoltzfus <ar...@um...> wrote: >> Today is the deadline for our 1-page synopsis to pitch to an NSF >> program >> officer (before going further). Currently we seem to have 3 >> pitches. It >> is time now for some energetic person to consolidate this, so that >> we can >> move ahead. >> >> Arlin >> >> On May 31, 2011, at 12:19 PM, Karen Cranston wrote: >> >>> Tomorrow morning (Wed, June 1) looks to be good for everyone, and >>> sooner seems better than later. I propose we talk at 9:00 am EST. I >>> will send connection information later today. >>> >>> Cheers, >>> Karen >>> >>> On Thu, May 26, 2011 at 3:00 PM, Karen Cranston >>> <kar...@ne...> wrote: >>>> >>>> There has been some interest among various groups in an ABI >>>> proposal >>>> for development of phyloinformatics resources. This email is an >>>> attempt to connect those threads and move the process forward. The >>>> conversations that have been happening up to this point are: >>>> >>>> 1. The Phyloinformatics Research Foundation (phylofoundation.org, >>>> stewards of TreeBASE and ToLWeb) started a Google doc aimed at >>>> TreeBASE >>>> 2. MIAPA developers started a wiki page >>>> (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the >>>> need for coordination with TreeBASE and other resources >>>> 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE >>>> host and >>>> as a third party interested in coordinated development across >>>> resources started a third document (now added to the already >>>> mentioned >>>> Google doc) >>>> >>>> If you are interested in this discussion and do not already have >>>> access to the Google doc entitled TreeBASE_ABI.doc, let me know >>>> and I >>>> can grant you access. Hilmar and I made some substantial edits >>>> earlier >>>> this morning. I point you specifically to the section at the end >>>> entitled "An attempt to re-think all of this". Briefly, we wanted >>>> to >>>> encourage some radical thinking and explore the idea of >>>> developing a >>>> PhyloCommons that incorporates both TreeBASE and ToLWeb into the >>>> proposal (as the data repository and the data sharing / >>>> dissemination >>>> / synthesis platform, respectively). >>>> >>>> The ABI deadline is July 7, so we have a short period of time to >>>> pull >>>> this together. Here is a link to a Doodle poll for an initial >>>> teleconference. >>>> >>>> http://doodle.com/zf2tz7sftyk3naxy >>>> >>>> During this meeting, we hope to come to agreement on the broad >>>> direction of the grant, identify possible leaders of the various >>>> components and create a plan for getting this pulled together in >>>> time >>>> for the deadline. Please feel free to continue the conversation >>>> on the >>>> Google doc between now and the teleconference. If there are >>>> others who >>>> you think should be invited, feel free to do so. Not everyone who >>>> participates in this first phase will end up being named on the >>>> grant, >>>> but these resources require input from a much larger group. >>>> >>>> Cheers, >>>> Karen >>>> >>>> >>>> -- >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> Karen Cranston >>>> Training Coordinator and Informatics Project Manager >>>> nescent.org >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>> >>> >>> >>> -- >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Karen Cranston >>> Training Coordinator and Informatics Project Manager >>> nescent.org >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "MIAPA" group. >>> For more options, visit this group at >>> http://groups.google.com/group/miapa-discuss?hl=en >> >> ------- >> Arlin Stoltzfus (ar...@um...) >> Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST >> IBBR, 9600 Gudelsky Drive, Rockville, MD >> tel: 240 314 6208; web: www.molevol.org >> >> -- >> You received this message because you are subscribed to the Google >> Groups "MIAPA" group. >> For more options, visit this group at >> http://groups.google.com/group/miapa-discuss?hl=en >> > > > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Karen Cranston > Training Coordinator and Informatics Project Manager > nescent.org > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |
From: Rutger V. <R....@re...> - 2011-06-06 14:24:46
|
Hi Karen, In a separate discussion on this topic with Bill he had the following comments, reproduced below. This to emphasize (as I also did in the google doc) that my ideas for TreeBASE redevelopment are only my own blue sky thinking. Bill favours a more gradual approach, and if that is something that could go into an ABI proposal it is probably the wiser option. Anyway, here are the remarks: =============== I've been struggling a bit -- vacillating between Hilmar's phases (phases 1 - 3 in the doc). Some thoughts: (1) Yes, this is mainly a document storage/retrieval system, but nonetheless there are still some very sexy queries that can be more easily implemented if at least some portion of the data are relational (such as the trees) -- such as functionalities that blend TreeBASE and ToLWeb (as desired by Karen). And no matter what ABI says re. just nuts/bolts/hammer stuff, sexy functionalities are still important because at the end of the day the grant reviewers will be biologists from outside of NSF -- so Vladimir's comments are still relevant. (2) Phylogenetic data objects have complex components that must interdigitate to work properly. For example, TreeBASE's ability to verify that the sets of taxon labels in matrices and their daughter trees match up perfectly catches errors in the great majority of all submissions. Which is just to say that the sad truth is that people try to deposit broken crap whenever they can get away with it -- that's just a fact of life, and it highlights the fact that Dryad is a very poor solution for data sharing. That TreeBASE guarantees that all of our analysis downloads can be opened in Mesquite without error is fabulously important. (In theory, of course, well-written document preparation software and validators can do the same thing as what TreeBASE currently does -- but that essentially shifts the problem to an earlier stage, such as writing a Mesquite plugin for data submission preparation, which itself contains all validation/error checking features, and then dumps rich NeXML for NoSQL-style storage. But if we do invest in developing a Mesquite plug-in, we won't be addressing how to ingest matrices and trees that exceed Mesquite's capabilities -- e.g. genomic-scale data -- so in a way we are just back to square one) (3) We must always be well-grounded in the ways in which biologists actually work, not just how we would like them to work -- the software they use, the work flows that they use, etc. We know that in their analysis phase, they use codes and abbreviations for their taxon labels. When it comes time to submitting to TreeBASE, suddenly they have to upgrade their data (e.g. writing taxa in full, for example), and that's when NEXUS files start to break. They often use software that produced poorly formed NEXUS. They often produce Newick trees that are incorrectly rooted/oriented (rooting in their figures being produced by special PAUP commands rather than the implicit order of parentheses). The idea that biologists will use a work-flow such that all metadata are nicely captured from the get-go, and therefore submission of metadata is trivially easy, is our fantasy of how we would like them to work, not how they actually work. (4) The MIIDI minimum metadata editor (http://www.miidi.org:8080/orbeon/miidi-review/report?id=14) is totally cool in that it provides the ability to mark up almost any data package for submission/storage using tons of metadata with controlled vocabularies, and where the extent of metadata provided can be verified as to whether it meets minimum standards. The problem is there is no way in hell that biologists will invest the time in this: can you imagine taking a 1,000-taxon tree, and for each 1,000 OTUs you have to click a set of nested boxes to enter the Genbank taxID number, the museum collection code, the lat-long, etc etc. ? Ha! No f*cking way (pardon my language). Realistically, we have to think in terms of both our fantasy system (like this MIIDI editor) and in terms of what is likely to be the case for most biologists -- i.e. spreadsheets -- things where people can copy/paste from Excel, etc. So... for a beefed up Hilmar phase 1 approach: (a) continue solving bugs, but going deeper -- i.e. solve those the deeper bug problems like the hanging queries, excess memory problems, etc, that require frequent reboots, with the goal that the application will be stable for much longer stretches of time (b) fix some of our really dumb data-model problems -- e.g. fuse the submission table with the study table. (c) soft-type all of our metadata for all objects: matrices, trees, nodes, etc. (d) provide alternative parsers for larger data imports, (e) provide automated taxon intel tools for alternative data sources (e.g. GNI) to just uBio, (f) pre-cache serializations for all major data objects so that mass downloads don't tax us of memory and CPU, (g) bring in the NCBI classification and/or connections with ToLWeb and provide sexy queries for questions of generic topology, (h) integrate sequence data with a BLAST engine for yet another sexy query option, (i) integrate the lat/long metadata with Google Earth or Map for yet another sexy query option, (j) totally redo the search interface to make it sexy and fun to use, (k) expand out the API, (l) modify the submission system so for MIAPA compliance, (m) provide a way to ingest MIAPA-compliant NeXML or submissions. (n) export all TreeBASE data into CouchDB as an alternative way to access/distribute the data. Now, granted, a huge problem is the service-layer bloat and the general headache of a fat and complex codebase. Can we solve this by putting programmers hard at work making major changes to the existing code, or must be start from scratch? And if we start from scratch, how do we know that we won't find ourself back in the same situation in five years hence?? It is easier to justify starting from scratch if we are saying that we need a whole new platform/architecture (e.g. NoSQL) -- otherwise we don't sound so good if we have to admit that the code that we wrote is dying under its own weight. On the other hand, as long as we budget enough fte programmer time into redoing it all from scratch, we might be able to avoid admitting that we are forced to redo from scratch. (or blame all our problems on Hibernate and argue for some other MVC framework). So one thing I'm saying is that sticking with SQL (but caching all data objects, and/or dumping to a JSON NoSQL server) would, I think, solve all the major performance/functionality issues while retaining the data integrity advantages and ability to do certain fancy queries which are more easily done by a RDMS. I don't think that an RDMS is necessarily An alternative is to build a Mesquite plugin that has a very rich interface, with all the data integrity checks, and with easy copy/paste spreadsheets for metadata, or metadata marked up directly on tree nodes and edges, etc, etc, and then have this push rich NeXML on to a NoSQL document storage system. Certain sexy queries (phylogeographic queries, BLAST searching, topology searching) might be sacrificed. And we'd be dealing with Mesquite -- which has its own limitations, idiosyncrasies, and code-bloat, etc. On Mon, Jun 6, 2011 at 3:04 PM, Karen Cranston <kar...@ne...> wrote: > There are several pitches now in the Google doc, with a fair bit of > overlap between them. I am willing to consolidate into a single page > and send to NSF (Reed?) and see what he has to say about the various > components. It seems like these components are: > 1. some level of re-engineering of TreeBASE > 2. further development of MIAPA, with annotation tools and TreeBASE integration > 3. use of ToLWeb as a crowd sourcing and data synthesis platform > 4. NeXML refinement and development > > I don't think this one-pager needs to capture all of the ideas and > details we currently have, but instead give a general sense of what we > are proposing and if all / some of these ideas is potentially > fundable. > > Everyone in agreement? I will post the single page in the doc later today. > > Karen > > On Fri, Jun 3, 2011 at 3:38 PM, Arlin Stoltzfus <ar...@um...> wrote: >> Today is the deadline for our 1-page synopsis to pitch to an NSF program >> officer (before going further). Currently we seem to have 3 pitches. It >> is time now for some energetic person to consolidate this, so that we can >> move ahead. >> >> Arlin >> >> On May 31, 2011, at 12:19 PM, Karen Cranston wrote: >> >>> Tomorrow morning (Wed, June 1) looks to be good for everyone, and >>> sooner seems better than later. I propose we talk at 9:00 am EST. I >>> will send connection information later today. >>> >>> Cheers, >>> Karen >>> >>> On Thu, May 26, 2011 at 3:00 PM, Karen Cranston >>> <kar...@ne...> wrote: >>>> >>>> There has been some interest among various groups in an ABI proposal >>>> for development of phyloinformatics resources. This email is an >>>> attempt to connect those threads and move the process forward. The >>>> conversations that have been happening up to this point are: >>>> >>>> 1. The Phyloinformatics Research Foundation (phylofoundation.org, >>>> stewards of TreeBASE and ToLWeb) started a Google doc aimed at >>>> TreeBASE >>>> 2. MIAPA developers started a wiki page >>>> (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the >>>> need for coordination with TreeBASE and other resources >>>> 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE host and >>>> as a third party interested in coordinated development across >>>> resources started a third document (now added to the already mentioned >>>> Google doc) >>>> >>>> If you are interested in this discussion and do not already have >>>> access to the Google doc entitled TreeBASE_ABI.doc, let me know and I >>>> can grant you access. Hilmar and I made some substantial edits earlier >>>> this morning. I point you specifically to the section at the end >>>> entitled "An attempt to re-think all of this". Briefly, we wanted to >>>> encourage some radical thinking and explore the idea of developing a >>>> PhyloCommons that incorporates both TreeBASE and ToLWeb into the >>>> proposal (as the data repository and the data sharing / dissemination >>>> / synthesis platform, respectively). >>>> >>>> The ABI deadline is July 7, so we have a short period of time to pull >>>> this together. Here is a link to a Doodle poll for an initial >>>> teleconference. >>>> >>>> http://doodle.com/zf2tz7sftyk3naxy >>>> >>>> During this meeting, we hope to come to agreement on the broad >>>> direction of the grant, identify possible leaders of the various >>>> components and create a plan for getting this pulled together in time >>>> for the deadline. Please feel free to continue the conversation on the >>>> Google doc between now and the teleconference. If there are others who >>>> you think should be invited, feel free to do so. Not everyone who >>>> participates in this first phase will end up being named on the grant, >>>> but these resources require input from a much larger group. >>>> >>>> Cheers, >>>> Karen >>>> >>>> >>>> -- >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> Karen Cranston >>>> Training Coordinator and Informatics Project Manager >>>> nescent.org >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>> >>> >>> >>> -- >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Karen Cranston >>> Training Coordinator and Informatics Project Manager >>> nescent.org >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "MIAPA" group. >>> For more options, visit this group at >>> http://groups.google.com/group/miapa-discuss?hl=en >> >> ------- >> Arlin Stoltzfus (ar...@um...) >> Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST >> IBBR, 9600 Gudelsky Drive, Rockville, MD >> tel: 240 314 6208; web: www.molevol.org >> >> -- >> You received this message because you are subscribed to the Google >> Groups "MIAPA" group. >> For more options, visit this group at >> http://groups.google.com/group/miapa-discuss?hl=en >> > > > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Karen Cranston > Training Coordinator and Informatics Project Manager > nescent.org > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -- > You received this message because you are subscribed to the Google > Groups "MIAPA" group. > For more options, visit this group at > http://groups.google.com/group/miapa-discuss?hl=en > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |
From: Arlin S. <ar...@um...> - 2011-06-06 14:48:37
|
On Jun 6, 2011, at 10:24 AM, Rutger Vos wrote: > (3) We must always be well-grounded in the ways in which biologists > actually work, not just how we would like them to work -- the software > they use, the work flows that they use, etc. We know that in their > analysis phase, they use codes and abbreviations for their taxon > labels. . . . > (4) The MIIDI minimum metadata editor > (http://www.miidi.org:8080/orbeon/miidi-review/report?id=14) is > totally cool . . . The problem is > there is no way in hell that biologists will invest the time in this: > can you imagine taking a 1,000-taxon tree, and for each 1,000 OTUs you > have to click a set of nested boxes to enter the Genbank taxID number, I agree with the thinking here-- IMHO our proposal will fare better if we focus on solving user problems (in sexy ways, of course). The main problem is that users need to archive (to comply with policies) but the crap that they are poised to archive is not re-usable. (TreeGrabber exists because most authors publish and archive pictures of trees rather than logically encoded trees). Archiving is going to happen, because it's being pushed by policies, but this won't have a huge impact on re-use until we make it easy for users to submit re- useable data. To break this down into manageable chunks, the biggest problems that I see are 1) most users need to translate their data into formats better suited to archiving; 2) the OTU names don't match within the user's own files; 3) the data objects referenced in the files do not have GUIDs or accessions that can be machine-processed; 4) the record does not have sufficient metadata annotations for potential re-users to judge accurately the prospects for re-use. The TreeBASE submission process doesn't help with #1, although Mesquite actually can help users load up their data from other formats into NEXUS. The TB submission process exposes problem #2 but doesn't help the user to fix it. However, matching N things with N other things is a classic problem in comp sci called "the marriage problem". There are many solutions. We just need to implement one and allow the user to accept or edit the suggested matching in a nice graphical way. If users have sequences, we can BLAST them and get both a suggested accession and a suggested species identifier. That solves #3 for molecular users. Support for #4 is already part of what the MIAPA people are proposing. Arlin ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |
From: William P. <wil...@ya...> - 2011-06-06 16:45:02
|
On Jun 6, 2011, at 10:48 AM, Arlin Stoltzfus wrote: > The TreeBASE submission process doesn't help with #1, although Mesquite actually can help users load up their data from other formats into NEXUS. The TB submission process exposes problem #2 but doesn't help the user to fix it. However, matching N things with N other things is a classic problem in comp sci called "the marriage problem". There are many solutions. We just need to implement one and allow the user to accept or edit the suggested matching in a nice graphical way. If users have sequences, we can BLAST them and get both a suggested accession and a suggested species identifier. That solves #3 for molecular users. > Support for #4 is already part of what the MIAPA people are proposing. Just some minor commentary: - I've written scripts that take Genbank accessions numbers, extract metadata out of Genbank, and format it ready for ingest by TreeBASE -- but I'm surprised at the number of times that people submit alignments containing sequences that are still embargoed by Genbank. (arg...). A lot of people just pick the default one-year embargo period, not knowing how long it will take for their article to get through the publishing system. So at the time of submitting to TreeBASE, we can't take advantage of any automatic cross-walking with Genbank. - Unfortunately, BLAST frequently doesn't work in that it often produces false positives. At best, we should use BLAST to *assist* the submitter in preparing metadata, but human eyes have to supervise this process. This also assumes that Genbank is richly annotated, and unfortunately that's not true. For example, in a sample of 21,736 records in Genbank that are found in TreeBASE, only 373 of them were tagged with lat/long metadata. :-( taken together, this weakens the statement that this "solves #3 for molecular users" bp |
From: Arlin S. <ar...@um...> - 2011-06-06 17:08:52
|
On Jun 6, 2011, at 12:44 PM, William Piel wrote: > Just some minor commentary: > > - I've written scripts that take Genbank accessions numbers, extract > metadata out of Genbank, and format it ready for ingest by TreeBASE > -- but I'm surprised at the number of times that people submit > alignments containing sequences that are still embargoed by Genbank. > (arg...). A lot of people just pick the default one-year embargo > period, not knowing how long it will take for their article to get > through the publishing system. So at the time of submitting to > TreeBASE, we can't take advantage of any automatic cross-walking > with Genbank. "any" only applies to the case of those newly determined sequences still subject to embargo, right? In other cases, sequences used in alignments are not embargoed, because they were published already, or because the author's embargo has expired. Do you know what fraction of cases are embargoed? Can TreeBASE periodically search post- submission to discover GenBank matches for any of its undocumented sequences (there would have to be some way to query the author to approve, I suppose)? > - Unfortunately, BLAST frequently doesn't work in that it often > produces false positives. At best, we should use BLAST to *assist* > the submitter in preparing metadata, but human eyes have to > supervise this process. This also assumes that Genbank is richly > annotated, and unfortunately that's not true. For example, in a > sample of 21,736 records in Genbank that are found in TreeBASE, only > 373 of them were tagged with lat/long metadata. :-( I agree. Automated methods to fill in the metadata blanks should be treated as suggestions, subject to the user's final approval. > taken together, this weakens the statement that this "solves #3 for > molecular users" OK, I agree that it is weakened, but while we are waiting for all the other problems in the world to be solved so that we can achieve metadata perfection, does this approach at least solve 80 % of problem #3 for molecular users? Or do you think it is a much smaller fraction than that? Arlin ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |
From: William P. <wil...@ya...> - 2011-06-06 17:19:13
|
On Jun 6, 2011, at 1:08 PM, Arlin Stoltzfus wrote: > OK, I agree that it is weakened, but while we are waiting for all the other problems in the world to be solved so that we can achieve metadata perfection, does this approach at least solve 80 % of problem #3 for molecular users? Or do you think it is a much smaller fraction than that? Don't get me wrong -- I'm totally in favor of doing anything that improves metadata -- especially features that make it easier for the submitter to assemble and submit the data. Another thing we could do, for example, is try to validate bio-collection codes with services at GBIF. e.g., if the researcher has deposited his specimens in the AMNH, we could search GBIF for each species (plus the AMNH museum code) and return a ranked list with most-recently-deposited specimens at the top. If the user sees his/her specimen, clicks on it, all other darwin core metadata get sucked in. (course, we still have the chicken-egg problem of those who submit to TreeBASE before depositing specimens in a museum). bp |