You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
(1) |
Apr
(41) |
May
(41) |
Jun
(50) |
Jul
(14) |
Aug
(21) |
Sep
(37) |
Oct
(8) |
Nov
(4) |
Dec
(135) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(145) |
Feb
(110) |
Mar
(216) |
Apr
(101) |
May
(42) |
Jun
(42) |
Jul
(23) |
Aug
(17) |
Sep
(33) |
Oct
(15) |
Nov
(18) |
Dec
(6) |
2011 |
Jan
(8) |
Feb
(10) |
Mar
(8) |
Apr
(41) |
May
(48) |
Jun
(62) |
Jul
(7) |
Aug
(9) |
Sep
(7) |
Oct
(11) |
Nov
(49) |
Dec
(1) |
2012 |
Jan
(17) |
Feb
(63) |
Mar
(4) |
Apr
(13) |
May
(17) |
Jun
(21) |
Jul
(10) |
Aug
(10) |
Sep
|
Oct
|
Nov
|
Dec
(16) |
2013 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
From: William P. <wil...@ya...> - 2011-09-27 02:31:39
|
Hi Carl, Were you deciding to use our OAI service because it allowed search-by-date? If so, you might want to know that TreeBASE's PhyloWS API now supports search-by-date, such as: /study/find?query=prism.creationDate>"2011-08-30T05:00:00Z"&format=rss1 This now works for prism.creationDate, prism.modificationDate, and prism.publicationDate. Use the prism.publicationDate for searching on the publication year of the article -- hence prism.publicationDate>"2011-01-01T05:00:00Z" includes all papers with the year 2011 and later (but a second earlier and it would also include all papers with the year 2010, because in TreeBASE's world, time is EST, which is 5 hours after London). Use prism.modificationDate to catch older papers that were since modified. The returned data contains article DOIs like so: <prism:doi xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">10.1006/mpev.2000.0752</prism:doi> Which presumably can be matched with article DOIs in Dryad. In our last conversation, I think Ryan said that "For the metadata on the Dryad end, we still need to correct some issues with the relationships to TreeBASE objects." For the TreeBASE end, we had discussed whether TreeBASE should use "dcterms:source" or "dcterms:isPartOf" (prefixed with http://dx.doi.org/). I think we decided on "dcterms:isPartOf" -- I'll check that we're exposing Dryad DOIs like so. Note that we will be using the "data package" DOI (e.g. "10.5061/dryad.tf48r") rather than the "data set" DOI (e.g. "10.5061/dryad.tf48r/2") because only the former is equivalent to a TreeBASE study. bp |
From: Carl B. <cbo...@gm...> - 2011-09-26 20:43:18
|
Thanks Hilmar. Yup, I was referring to the OAI-PMH on TreeBASE. On Mon, Sep 26, 2011 at 1:39 PM, Hilmar Lapp <hl...@ne...> wrote: > Hi Carl - > > I'm copying this over to the Dryad developers list so that it catches > Ryan's attention. Can you clarify which OAI-PMH interface you mean - that of > Dryad, or that of TreeBASE? > > -hilmar > > On Sep 26, 2011, at 2:28 PM, Carl Boettiger wrote: > > Hi list, > > A bit ago there was some discussion about tightening the links between > Treebase entries and dryad entries. For instance, it would be useful to get > at least the article doi, if not a dryad identifier, from the OAI/MPH API. > Any thoughts on this? > > -Carl > > -- > Carl Boettiger > UC Davis > http://www.carlboettiger.info/ > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > > http://p.sf.net/sfu/splunk-d2dcopy1_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : > =========================================================== > > > > -- Carl Boettiger UC Davis http://www.carlboettiger.info/ |
From: Hilmar L. <hl...@ne...> - 2011-09-26 20:39:19
|
Hi Carl - I'm copying this over to the Dryad developers list so that it catches Ryan's attention. Can you clarify which OAI-PMH interface you mean - that of Dryad, or that of TreeBASE? -hilmar On Sep 26, 2011, at 2:28 PM, Carl Boettiger wrote: > Hi list, > > A bit ago there was some discussion about tightening the links > between Treebase entries and dryad entries. For instance, it would > be useful to get at least the article doi, if not a dryad > identifier, from the OAI/MPH API. Any thoughts on this? > > -Carl > > -- > Carl Boettiger > UC Davis > http://www.carlboettiger.info/ > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and > makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Carl B. <cbo...@gm...> - 2011-09-26 18:28:49
|
Hi list, A bit ago there was some discussion about tightening the links between Treebase entries and dryad entries. For instance, it would be useful to get at least the article doi, if not a dryad identifier, from the OAI/MPH API. Any thoughts on this? -Carl -- Carl Boettiger UC Davis http://www.carlboettiger.info/ |
From: Arlin S. <ar...@um...> - 2011-09-23 16:14:53
|
I'd like to get some feedback on an idea for a workshop project. This came up earlier today in a discussion with Jim Leebens-Mack, Enrico Pontelli & Maryam Panahiazar. We are interested in annotating a set of phylogenetic studies (one of the suggested deliverables below). The process of annotation would help us to work out the metadata attributes needed to describe a study-- in particular, describing phylogenetic methods and workflows is a thorny problem--, and the resulting set of annotated studies could serve as a benchmark or training set for automated methods. We talked about using random samples of phylogeny publications, or some set of canonical or exemplary publications. Then the thought occurred to us: if ultimately we are trying to facilitate data-sharing, why waste our time annotating publications for which we are not offering the supporting data for public re-use via some searchable interface? Instead, why not annotate some studies that are already poised for re-use, with the data available via a searchable interface? An example would be the ~300 studies that were deposited last year in TreeBASE. If we work with the TreeBASE providers, perhaps we could provide an search interface that takes advantage of the extra annotations. Another important set of re- useable trees is the set of trees agglomerated into the APG tree, or the ToLWeb tree. Any thoughts on this? Would this compromise our other goals of working out problems in annotation while creating a benchmark set of publications? What is the best set of publications to annotate? Arlin On Aug 15, 2011, at 10:36 AM, Nico Cellinese wrote: > TDWG MIAPA Workshop > Call For Participation: > Steps towards a Minimum Information About a Phylogenetic Analysis > (MIAPA) Standard > Synopsis > > Many phylogenetic analysis results are published in ways that > present serious barriers to their reuse in numerous research > applications that would stand to benefit from them. While some of > these barriers are well understood, such as issues with adherence to > standard exchange formats, those centering on the associated > metadata necessary for researchers to evaluate or reuse a published > phylogeny have only recently begun to be articulated. One of the > critical next steps towards formalizing these metadata requirements > as a minimum reporting standard is to convene meetings of key > stakeholder communities with the goal to identify information > attributes necessary and desirable for facilitating reuse, and to > build consensus on their priority. To this end, we are holding a > workshop at the 2011 Biodiversity Information Standards (TDWG) > Conference to determine how a future reporting standard for > phylogenetic analyses can best serve biodiversity science and > related research applications. We invite all interested colleagues > to participate. > Background > > The workshop of the Biodiversity Information Standards (TDWG) > Phylogenetics Standards Interest Group held at the 2010 TDWG > conference included a project focused on how to publish re-usable > trees that can be linked into an emerging global web of data. > Through follow-up work, this led to the following tangible results: > An online draft report of the 2010 TDWG workshop [1], and a > corresponding manuscript on best practices for publishing > phylogenetic trees (Stoltzfus et al. in preparation); > An 2011 iEvoBio presentation on “Publishing re-usable phylogenetic > trees, in theory and in practice” [2]; > A lighting talk presentation and Birds-of-a-Feather gathering at > 2011 iEvoBio, and > A survey group that explored barriers to re-use and developed plans > for a survey > These activities have considerably clarified our understanding of > the theory and practice of publishing re-usable phylogenetic trees: > how many phylogenies are published each year, the (low) frequency of > archiving, what archives and tools are available, what policies are > in force, etc. We have identified a number of barriers to re-use > involving such aspects as technology, standards, culture, and access. > Many of these barriers can be interpreted as a consequence of the > lack of a community-agreed standard for what constitutes a well > documented phylogenetic record. In the absence of such a standard, > trees are often archived as image files rather than in appropriate > data exchange formats, and lack important accompanying information > (metadata), such as externally meaningful identifiers, that would be > needed to make them useful to others. The idea of a Minimum > Information About a Phylogenetic Analysis (MIAPA) standard has been > suggested [3], but so far there has not been a deliberate process to > develop and disseminate a community standard. Meanwhile, a number > of systematics and evolution journals have begun to require > archiving of the data underlying published research findings [4]. > The emerging cultural shift in data archiving and sharing promoted > by this policy change offers a unique window of opportunity to move > ahead with the development and actual specification of a MIAPA > standard. > Similar to other minimum reporting standards [5], the primary focus > of a future MIAPA standard would be on defining a “checklist” of > metadata information attributes that, at a minimum, needs to > accompany an archived phylogenetic analysis, and to which standards > values for these attributes would need to adhere. The key step in > developing community consensus on these elements of the standard is > to convene a series of meetings that collectively involve > participants from all major groups of stakeholders who would be > affected by such a standard, such as users, producers, publishers, > or archivists of phylogenetic analyses. To aid this process, the > Phylogenetics Standards Interest Group is holding a workshop at the > 2011 TDWG conference, with the goal to obtain consensus requirements > and priorities for a MIAPA checklist for the purposes of > biodiversity science, taxonomy, museum collections, and related > research applications. > Goals and deliverables > > The main goal of the workshop is to develop a shared understanding > of the role that a MIAPA standard could play in facilitating re-use > of phylogenetic analyses for the biodiversity science and related > communities, and what the standard would need to specify in order > to best fill that role. Possible deliverables include > A draft set of information attributes that should or could be > included in a provisional MIAPA checklist, with a level of consensus > for each of them. > A database with use-cases based on exemplifying publications, that > report phylogenies to elucidate a broad spectrum of questions > relating to biodiversity science. > A refined MIAPA survey to be informed by biodiversity science cases > for reuse. > A plan for further community engagement and consensus-building among > biodiversity science stakeholders. > Workshop format > > The workshop will start with a few presentations focused on (i) > introducing MIAPA and its potential in facilitating reuse (J. > Leebens-Mack); (ii) summarizing recent developments and current > status of MIAPA-related efforts (A. Stoltzfus); and (iii) past > experiences and resulting best practice recommendations on > developing a minimum reporting checklist standard (D. Field). The > rest of the workshop will be hands-on. Participants in the workshop > will break out into groups to address separate issues according to > the anticipated deliverables and best practice recommendations. > The workshop will be 1.5 days in duration, and be held during the > 2011 Biodiversity Information Standards (TDWG) conference, to take > place Oct 17 to 21, 2011 in New Orleans, USA. (http://www.tdwg.org/conference2011/ > ). The workshop will start in the afternoon of Monday, Oct 17, and > end on Tuesday. Oct 18. > How to participate > > Participation in the workshop is open to everyone interested. > However, space is limited, and we therefore ask that, if you are > interested in attending, to please communicate your interest through > the MIAPA discussion group [6]. This will also allow us to include > you in pre-workshop planning. Since the workshop is part of the TDWG > conference, participants will need to register either for the full > conference, or for the days of the workshop. > The organizers will provide an electronic venue for participants to > share ideas and develop plans in advance of the workshop. After the > initial presentations, participants will self-organize into task > groups. > Organizers > Nico Celinese, University of Florida > Hilmar Lapp, NESCent > Jim Leebens-Mack, University of Georgia > Enrico Pontelli, New Mexico State University > Arlin Stoltzfus, NIST & University of Maryland > References > > [1] Whitacre et al. (2010). Current Best Practices for Publishing > Trees Electronically. http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/LinkingTrees2010 > [2] O’Meara et al. (2011). Publishing re-usable phylogenetic trees, > in theory and practice. Available from Nature Precedings <http://dx.doi.org/10.1038/npre.2011.6048.1 > > > [3] Leebens-Mack, J., T. Vision, et al. (2006). "Taking the first > steps towards a standard for reporting on phylogenies: Minimum > Information About a Phylogenetic Analysis (MIAPA)." Omics 10(2): > 231-7. > [4] Whitlock, M., M. McPeek, M. Rausher, L. Rieseberg, and A. Moore > (2010). Data Archiving (Editorial). The American Naturalist 175(2): > 145. > [5] Taylor, C.F., D. Field, S. Sansone, J. Aerts, R. Apweiler, M. > Ashburner, C.A. Ball, et al. (2008). Promoting coherent minimum > reporting guidelines for biological and biomedical investigations: > the MIBBI project. Nature Biotechnology 26(8): 889-96. doi:10.1038/ > nbt.1411 > [6] MIAPA discussion group: http://groups.google.com/group/miapa-discuss > Published by Google Docs–Report Abuse–Updated automatically every 5 > minutes > <ATT00001.txt> ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |
From: Hilmar L. <hl...@ne...> - 2011-08-28 15:51:57
|
I spoke with the developer, Martin Gerner. He thought it might be well applicable to this task, even though the tool does a lot more than we possibly need here. For example, it tokenizes the input, and also is capable of applying some special "inference" rules (for instance, "HeLa cells" will be tagged with "Homo sapiens") that are quite useful if the purpose is linking of text to knowledge terms, but go beyond simple synonym matching (which it does, too, though). The dictionaries are pluggable, and apparently it is quite fast in principle. -hilmar On Aug 22, 2011, at 6:39 PM, Mark Holder wrote: > Hi all, > I just noticed that Hilmar tweeted a link to Linnaeus: http://linnaeus.sourceforge.net/ > which seems relevant to this thread. > > all the best, > Mark > > On Aug 19, 2011, at 11:06 AM, Arlin Stoltzfus wrote: > >> On Aug 15, 2011, at 4:09 AM, Roderic Page wrote: >> >>> Mapping tree names to matrix names could be formulated as a >>> bipartite matching problem, where we have two lists of names and >>> want to find the best matching. See http://iphylo.blogspot.com/2007/09/matching-names-in-phylogeny-data-files.html >>> for more details. >> >> In computer science, this is called the "marriage problem" when the >> two lists are the same size. We have a set { X } and a set { Y } >> of elements with some properties. We have a function f( X_i, Y_j ) >> that computes a match score for each pair, using the properties. >> In our case, the only property is the name-string. The marriage >> problem is to find a pairwise mapping that is optimal in some >> way. If optimality means minimizing the cost of the worst match, >> then this is (apparently, to me) the same as the linear bottleneck >> assignment problem. >> >> An obvious function to use (not necessarily the best for our case) >> is the edit distance, i.e., the number of character-wise edit >> operations to convert X_i into Y_j. This is called the >> Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance >> ). >> >> But there is nothing to stop us from creating a distance function >> that is optimized to work well in phyloinformatics. We could test >> different functions using real cases such as the ones in my >> slideshow. >> >> One special condition is that, for us, the cost of a s/<underscore>/ >> <space> / edit is very low. Another special condition is >> reflected in Rod's longest-common-substring method of matching-- we >> often have pairs of matching names that have long matching >> substrings and differ by interruptions. Maybe we need a gap-open >> and gap-extend penalty like in sequence alignment algorithms. >> >> Arlin >> >>> This approach could extended to, say, matching names in a NEXUS >>> file to those in a publication, or a GenBank POPSET from a >>> publication. For example, if we have a NEXUS file and a POPSET we >>> could compute the best matching between the two sets of names. Or >>> taxon names and/or accession numbers could be retrieved from the >>> publication. >>> >>> This would also help provide the context to help avoid homonyms, >>> such as matching animal names to plant names. >>> >>> Regards >>> >>> Rod >>> >>> >>> On 15 Aug 2011, at 05:13, Rutger Vos wrote: >>> >>>>> this calls for easy-to-use NeXML editors. e.g. add the ability >>>>> to enter >>>>> Genbank accession numbers in Mesquite, and then save as NeXML, >>>>> thus >>>>> preserving "Homo_sapiens" consistently in all alignments and >>>>> resulting >>>>> trees, while still communicating the respective accession >>>>> numbers for each >>>>> locus. Summer-of-Code project here. >>>> >>>> Indeed. >>>> >>>>> C- The basic data model of matrix-rows-matching-with-tree-OTUs >>>>> works for 99% >>>>> of datasets, but a growing number of studies use BEAST species >>>>> inference >>>>> (and other similar methods) where the tree ends in species OTUs, >>>>> but the >>>>> alignment has many more haplotype OTUs. -- i.e. there is, on >>>>> purpose, a >>>>> complete mismatch between alignment row labels and tree OTUs. >>>>> Mesquite can >>>>> handle this using a taxon association table, though I don't know >>>>> that this >>>>> is formal NEXUS or just a Mesquite invention. I don't think that >>>>> NeXML or >>>>> PhyloML can handle this. This calls for expanding the >>>>> capabilities of NeXML >>>>> and PhyloML. >>>> >>>> Yes and no. Multiple matrix rows can reference the same otu, but >>>> that's not quite what we want. Multiple, separately annotatable >>>> matrix >>>> row segments would be a good feature to have, also for TreeBASE's >>>> needs. >>>> >>>> >>>> >>>> -- >>>> Dr. Rutger A. Vos >>>> School of Biological Sciences >>>> Philip Lyle Building, Level 4 >>>> University of Reading >>>> Reading, RG6 6BX, United Kingdom >>>> Tel: +44 (0) 118 378 7535 >>>> http://rutgervos.blogspot.com >>>> >>>> ------------------------------------------------------------------------------ >>>> uberSVN's rich system and user administration capabilities and >>>> model >>>> configuration take the hassle out of deploying and managing >>>> Subversion and >>>> the tools developers use with it. Learn more about uberSVN and >>>> get a free >>>> download at: http://p.sf.net/sfu/wandisco-dev2dev >>>> _______________________________________________ >>>> Treebase-devel mailing list >>>> Tre...@li... >>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>> >>> >>> >>> --------------------------------------------------------- >>> Roderic Page >>> Professor of Taxonomy >>> Institute of Biodiversity, Animal Health and Comparative Medicine >>> College of Medical, Veterinary and Life Sciences >>> Graham Kerr Building >>> University of Glasgow >>> Glasgow G12 8QQ, UK >>> >>> Email: r....@bi... >>> Tel: +44 141 330 4778 >>> Fax: +44 141 330 2792 >>> AIM: rod...@ai... >>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>> Twitter: http://twitter.com/rdmpage >>> Blog: http://iphylo.blogspot.com >>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>> >> >> ------- >> Arlin Stoltzfus (ar...@um...) >> Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST >> IBBR, 9600 Gudelsky Drive, Rockville, MD >> tel: 240 314 6208; web: www.molevol.org >> >> ------------------------------------------------------------------------------ >> Get a FREE DOWNLOAD! and learn more about uberSVN rich system, >> user administration capabilities and model configuration. Take >> the hassle out of deploying and managing Subversion and the >> tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2_______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > > ------------------------------------------------------------------------------ > uberSVN's rich system and user administration capabilities and model > configuration take the hassle out of deploying and managing > Subversion and > the tools developers use with it. Learn more about uberSVN and get a > free > download at: http://p.sf.net/sfu/wandisco-dev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Hilmar L. <hl...@ne...> - 2011-08-23 01:42:11
|
The developer of that is here at the BioHackathon, so let me know if I should pull him aside and ask questions. -hilmar Sent with a tap. On Aug 22, 2011, at 6:39 PM, Mark Holder <mth...@gm...> wrote: > Hi all, > I just noticed that Hilmar tweeted a link to Linnaeus: http://linnaeus.sourceforge.net/ which seems relevant to this thread. > > all the best, > Mark > > On Aug 19, 2011, at 11:06 AM, Arlin Stoltzfus wrote: > >> On Aug 15, 2011, at 4:09 AM, Roderic Page wrote: >> >>> Mapping tree names to matrix names could be formulated as a bipartite matching problem, where we have two lists of names and want to find the best matching. See http://iphylo.blogspot.com/2007/09/matching-names-in-phylogeny-data-files.html for more details. >> >> In computer science, this is called the "marriage problem" when the two lists are the same size. We have a set { X } and a set { Y } of elements with some properties. We have a function f( X_i, Y_j ) that computes a match score for each pair, using the properties. In our case, the only property is the name-string. The marriage problem is to find a pairwise mapping that is optimal in some way. If optimality means minimizing the cost of the worst match, then this is (apparently, to me) the same as the linear bottleneck assignment problem. >> >> An obvious function to use (not necessarily the best for our case) is the edit distance, i.e., the number of character-wise edit operations to convert X_i into Y_j. This is called the Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance). >> >> But there is nothing to stop us from creating a distance function that is optimized to work well in phyloinformatics. We could test different functions using real cases such as the ones in my slideshow. >> >> One special condition is that, for us, the cost of a s/<underscore>/<space> / edit is very low. Another special condition is reflected in Rod's longest-common-substring method of matching-- we often have pairs of matching names that have long matching substrings and differ by interruptions. Maybe we need a gap-open and gap-extend penalty like in sequence alignment algorithms. >> >> Arlin >> >>> This approach could extended to, say, matching names in a NEXUS file to those in a publication, or a GenBank POPSET from a publication. For example, if we have a NEXUS file and a POPSET we could compute the best matching between the two sets of names. Or taxon names and/or accession numbers could be retrieved from the publication. >>> >>> This would also help provide the context to help avoid homonyms, such as matching animal names to plant names. >>> >>> Regards >>> >>> Rod >>> >>> >>> On 15 Aug 2011, at 05:13, Rutger Vos wrote: >>> >>>>> this calls for easy-to-use NeXML editors. e.g. add the ability to enter >>>>> Genbank accession numbers in Mesquite, and then save as NeXML, thus >>>>> preserving "Homo_sapiens" consistently in all alignments and resulting >>>>> trees, while still communicating the respective accession numbers for each >>>>> locus. Summer-of-Code project here. >>>> >>>> Indeed. >>>> >>>>> C- The basic data model of matrix-rows-matching-with-tree-OTUs works for 99% >>>>> of datasets, but a growing number of studies use BEAST species inference >>>>> (and other similar methods) where the tree ends in species OTUs, but the >>>>> alignment has many more haplotype OTUs. -- i.e. there is, on purpose, a >>>>> complete mismatch between alignment row labels and tree OTUs. Mesquite can >>>>> handle this using a taxon association table, though I don't know that this >>>>> is formal NEXUS or just a Mesquite invention. I don't think that NeXML or >>>>> PhyloML can handle this. This calls for expanding the capabilities of NeXML >>>>> and PhyloML. >>>> >>>> Yes and no. Multiple matrix rows can reference the same otu, but >>>> that's not quite what we want. Multiple, separately annotatable matrix >>>> row segments would be a good feature to have, also for TreeBASE's >>>> needs. >>>> >>>> >>>> >>>> -- >>>> Dr. Rutger A. Vos >>>> School of Biological Sciences >>>> Philip Lyle Building, Level 4 >>>> University of Reading >>>> Reading, RG6 6BX, United Kingdom >>>> Tel: +44 (0) 118 378 7535 >>>> http://rutgervos.blogspot.com >>>> >>>> ------------------------------------------------------------------------------ >>>> uberSVN's rich system and user administration capabilities and model >>>> configuration take the hassle out of deploying and managing Subversion and >>>> the tools developers use with it. Learn more about uberSVN and get a free >>>> download at: http://p.sf.net/sfu/wandisco-dev2dev >>>> _______________________________________________ >>>> Treebase-devel mailing list >>>> Tre...@li... >>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>> >>> >>> >>> --------------------------------------------------------- >>> Roderic Page >>> Professor of Taxonomy >>> Institute of Biodiversity, Animal Health and Comparative Medicine >>> College of Medical, Veterinary and Life Sciences >>> Graham Kerr Building >>> University of Glasgow >>> Glasgow G12 8QQ, UK >>> >>> Email: r....@bi... >>> Tel: +44 141 330 4778 >>> Fax: +44 141 330 2792 >>> AIM: rod...@ai... >>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>> Twitter: http://twitter.com/rdmpage >>> Blog: http://iphylo.blogspot.com >>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>> >> >> ------- >> Arlin Stoltzfus (ar...@um...) >> Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST >> IBBR, 9600 Gudelsky Drive, Rockville, MD >> tel: 240 314 6208; web: www.molevol.org >> >> ------------------------------------------------------------------------------ >> Get a FREE DOWNLOAD! and learn more about uberSVN rich system, >> user administration capabilities and model configuration. Take >> the hassle out of deploying and managing Subversion and the >> tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2_______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > > ------------------------------------------------------------------------------ > uberSVN's rich system and user administration capabilities and model > configuration take the hassle out of deploying and managing Subversion and > the tools developers use with it. Learn more about uberSVN and get a free > download at: http://p.sf.net/sfu/wandisco-dev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |
From: Mark H. <mth...@gm...> - 2011-08-22 09:40:07
|
Hi all, I just noticed that Hilmar tweeted a link to Linnaeus: http://linnaeus.sourceforge.net/ which seems relevant to this thread. all the best, Mark On Aug 19, 2011, at 11:06 AM, Arlin Stoltzfus wrote: > On Aug 15, 2011, at 4:09 AM, Roderic Page wrote: > >> Mapping tree names to matrix names could be formulated as a bipartite matching problem, where we have two lists of names and want to find the best matching. See http://iphylo.blogspot.com/2007/09/matching-names-in-phylogeny-data-files.html for more details. > > In computer science, this is called the "marriage problem" when the two lists are the same size. We have a set { X } and a set { Y } of elements with some properties. We have a function f( X_i, Y_j ) that computes a match score for each pair, using the properties. In our case, the only property is the name-string. The marriage problem is to find a pairwise mapping that is optimal in some way. If optimality means minimizing the cost of the worst match, then this is (apparently, to me) the same as the linear bottleneck assignment problem. > > An obvious function to use (not necessarily the best for our case) is the edit distance, i.e., the number of character-wise edit operations to convert X_i into Y_j. This is called the Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance). > > But there is nothing to stop us from creating a distance function that is optimized to work well in phyloinformatics. We could test different functions using real cases such as the ones in my slideshow. > > One special condition is that, for us, the cost of a s/<underscore>/<space> / edit is very low. Another special condition is reflected in Rod's longest-common-substring method of matching-- we often have pairs of matching names that have long matching substrings and differ by interruptions. Maybe we need a gap-open and gap-extend penalty like in sequence alignment algorithms. > > Arlin > >> This approach could extended to, say, matching names in a NEXUS file to those in a publication, or a GenBank POPSET from a publication. For example, if we have a NEXUS file and a POPSET we could compute the best matching between the two sets of names. Or taxon names and/or accession numbers could be retrieved from the publication. >> >> This would also help provide the context to help avoid homonyms, such as matching animal names to plant names. >> >> Regards >> >> Rod >> >> >> On 15 Aug 2011, at 05:13, Rutger Vos wrote: >> >>>> this calls for easy-to-use NeXML editors. e.g. add the ability to enter >>>> Genbank accession numbers in Mesquite, and then save as NeXML, thus >>>> preserving "Homo_sapiens" consistently in all alignments and resulting >>>> trees, while still communicating the respective accession numbers for each >>>> locus. Summer-of-Code project here. >>> >>> Indeed. >>> >>>> C- The basic data model of matrix-rows-matching-with-tree-OTUs works for 99% >>>> of datasets, but a growing number of studies use BEAST species inference >>>> (and other similar methods) where the tree ends in species OTUs, but the >>>> alignment has many more haplotype OTUs. -- i.e. there is, on purpose, a >>>> complete mismatch between alignment row labels and tree OTUs. Mesquite can >>>> handle this using a taxon association table, though I don't know that this >>>> is formal NEXUS or just a Mesquite invention. I don't think that NeXML or >>>> PhyloML can handle this. This calls for expanding the capabilities of NeXML >>>> and PhyloML. >>> >>> Yes and no. Multiple matrix rows can reference the same otu, but >>> that's not quite what we want. Multiple, separately annotatable matrix >>> row segments would be a good feature to have, also for TreeBASE's >>> needs. >>> >>> >>> >>> -- >>> Dr. Rutger A. Vos >>> School of Biological Sciences >>> Philip Lyle Building, Level 4 >>> University of Reading >>> Reading, RG6 6BX, United Kingdom >>> Tel: +44 (0) 118 378 7535 >>> http://rutgervos.blogspot.com >>> >>> ------------------------------------------------------------------------------ >>> uberSVN's rich system and user administration capabilities and model >>> configuration take the hassle out of deploying and managing Subversion and >>> the tools developers use with it. Learn more about uberSVN and get a free >>> download at: http://p.sf.net/sfu/wandisco-dev2dev >>> _______________________________________________ >>> Treebase-devel mailing list >>> Tre...@li... >>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>> >> >> >> --------------------------------------------------------- >> Roderic Page >> Professor of Taxonomy >> Institute of Biodiversity, Animal Health and Comparative Medicine >> College of Medical, Veterinary and Life Sciences >> Graham Kerr Building >> University of Glasgow >> Glasgow G12 8QQ, UK >> >> Email: r....@bi... >> Tel: +44 141 330 4778 >> Fax: +44 141 330 2792 >> AIM: rod...@ai... >> Facebook: http://www.facebook.com/profile.php?id=1112517192 >> Twitter: http://twitter.com/rdmpage >> Blog: http://iphylo.blogspot.com >> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >> > > ------- > Arlin Stoltzfus (ar...@um...) > Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST > IBBR, 9600 Gudelsky Drive, Rockville, MD > tel: 240 314 6208; web: www.molevol.org > > ------------------------------------------------------------------------------ > Get a FREE DOWNLOAD! and learn more about uberSVN rich system, > user administration capabilities and model configuration. Take > the hassle out of deploying and managing Subversion and the > tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |
From: Arlin S. <ar...@um...> - 2011-08-19 15:06:51
|
On Aug 15, 2011, at 4:09 AM, Roderic Page wrote: > Mapping tree names to matrix names could be formulated as a > bipartite matching problem, where we have two lists of names and > want to find the best matching. See http://iphylo.blogspot.com/2007/09/matching-names-in-phylogeny-data-files.html > for more details. In computer science, this is called the "marriage problem" when the two lists are the same size. We have a set { X } and a set { Y } of elements with some properties. We have a function f( X_i, Y_j ) that computes a match score for each pair, using the properties. In our case, the only property is the name-string. The marriage problem is to find a pairwise mapping that is optimal in some way. If optimality means minimizing the cost of the worst match, then this is (apparently, to me) the same as the linear bottleneck assignment problem. An obvious function to use (not necessarily the best for our case) is the edit distance, i.e., the number of character-wise edit operations to convert X_i into Y_j. This is called the Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance ). But there is nothing to stop us from creating a distance function that is optimized to work well in phyloinformatics. We could test different functions using real cases such as the ones in my slideshow. One special condition is that, for us, the cost of a s/<underscore>/ <space> / edit is very low. Another special condition is reflected in Rod's longest-common-substring method of matching-- we often have pairs of matching names that have long matching substrings and differ by interruptions. Maybe we need a gap-open and gap-extend penalty like in sequence alignment algorithms. Arlin > This approach could extended to, say, matching names in a NEXUS file > to those in a publication, or a GenBank POPSET from a publication. > For example, if we have a NEXUS file and a POPSET we could compute > the best matching between the two sets of names. Or taxon names and/ > or accession numbers could be retrieved from the publication. > > This would also help provide the context to help avoid homonyms, > such as matching animal names to plant names. > > Regards > > Rod > > > On 15 Aug 2011, at 05:13, Rutger Vos wrote: > >>> this calls for easy-to-use NeXML editors. e.g. add the ability to >>> enter >>> Genbank accession numbers in Mesquite, and then save as NeXML, thus >>> preserving "Homo_sapiens" consistently in all alignments and >>> resulting >>> trees, while still communicating the respective accession numbers >>> for each >>> locus. Summer-of-Code project here. >> >> Indeed. >> >>> C- The basic data model of matrix-rows-matching-with-tree-OTUs >>> works for 99% >>> of datasets, but a growing number of studies use BEAST species >>> inference >>> (and other similar methods) where the tree ends in species OTUs, >>> but the >>> alignment has many more haplotype OTUs. -- i.e. there is, on >>> purpose, a >>> complete mismatch between alignment row labels and tree OTUs. >>> Mesquite can >>> handle this using a taxon association table, though I don't know >>> that this >>> is formal NEXUS or just a Mesquite invention. I don't think that >>> NeXML or >>> PhyloML can handle this. This calls for expanding the capabilities >>> of NeXML >>> and PhyloML. >> >> Yes and no. Multiple matrix rows can reference the same otu, but >> that's not quite what we want. Multiple, separately annotatable >> matrix >> row segments would be a good feature to have, also for TreeBASE's >> needs. >> >> >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading, RG6 6BX, United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://rutgervos.blogspot.com >> >> ------------------------------------------------------------------------------ >> uberSVN's rich system and user administration capabilities and model >> configuration take the hassle out of deploying and managing >> Subversion and >> the tools developers use with it. Learn more about uberSVN and get >> a free >> download at: http://p.sf.net/sfu/wandisco-dev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > > --------------------------------------------------------- > Roderic Page > Professor of Taxonomy > Institute of Biodiversity, Animal Health and Comparative Medicine > College of Medical, Veterinary and Life Sciences > Graham Kerr Building > University of Glasgow > Glasgow G12 8QQ, UK > > Email: r....@bi... > Tel: +44 141 330 4778 > Fax: +44 141 330 2792 > AIM: rod...@ai... > Facebook: http://www.facebook.com/profile.php?id=1112517192 > Twitter: http://twitter.com/rdmpage > Blog: http://iphylo.blogspot.com > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |
From: Roderic P. <r....@bi...> - 2011-08-15 09:01:34
|
There some additional tricks that could be used. Mapping tree names to matrix names could be formulated as a bipartite matching problem, where we have two lists of names and want to find the best matching. See http://iphylo.blogspot.com/2007/09/matching-names-in-phylogeny-data-files.html for more details. This approach could extended to, say, matching names in a NEXUS file to those in a publication, or a GenBank POPSET from a publication. For example, if we have a NEXUS file and a POPSET we could compute the best matching between the two sets of names. Or taxon names and/or accession numbers could be retrieved from the publication. This would also help provide the context to help avoid homonyms, such as matching animal names to plant names. Regards Rod On 15 Aug 2011, at 05:13, Rutger Vos wrote: >> this calls for easy-to-use NeXML editors. e.g. add the ability to enter >> Genbank accession numbers in Mesquite, and then save as NeXML, thus >> preserving "Homo_sapiens" consistently in all alignments and resulting >> trees, while still communicating the respective accession numbers for each >> locus. Summer-of-Code project here. > > Indeed. > >> C- The basic data model of matrix-rows-matching-with-tree-OTUs works for 99% >> of datasets, but a growing number of studies use BEAST species inference >> (and other similar methods) where the tree ends in species OTUs, but the >> alignment has many more haplotype OTUs. -- i.e. there is, on purpose, a >> complete mismatch between alignment row labels and tree OTUs. Mesquite can >> handle this using a taxon association table, though I don't know that this >> is formal NEXUS or just a Mesquite invention. I don't think that NeXML or >> PhyloML can handle this. This calls for expanding the capabilities of NeXML >> and PhyloML. > > Yes and no. Multiple matrix rows can reference the same otu, but > that's not quite what we want. Multiple, separately annotatable matrix > row segments would be a good feature to have, also for TreeBASE's > needs. > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading, RG6 6BX, United Kingdom > Tel: +44 (0) 118 378 7535 > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > uberSVN's rich system and user administration capabilities and model > configuration take the hassle out of deploying and managing Subversion and > the tools developers use with it. Learn more about uberSVN and get a free > download at: http://p.sf.net/sfu/wandisco-dev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |
From: Rutger V. <R....@re...> - 2011-08-15 04:13:44
|
> this calls for easy-to-use NeXML editors. e.g. add the ability to enter > Genbank accession numbers in Mesquite, and then save as NeXML, thus > preserving "Homo_sapiens" consistently in all alignments and resulting > trees, while still communicating the respective accession numbers for each > locus. Summer-of-Code project here. Indeed. > C- The basic data model of matrix-rows-matching-with-tree-OTUs works for 99% > of datasets, but a growing number of studies use BEAST species inference > (and other similar methods) where the tree ends in species OTUs, but the > alignment has many more haplotype OTUs. -- i.e. there is, on purpose, a > complete mismatch between alignment row labels and tree OTUs. Mesquite can > handle this using a taxon association table, though I don't know that this > is formal NEXUS or just a Mesquite invention. I don't think that NeXML or > PhyloML can handle this. This calls for expanding the capabilities of NeXML > and PhyloML. Yes and no. Multiple matrix rows can reference the same otu, but that's not quite what we want. Multiple, separately annotatable matrix row segments would be a good feature to have, also for TreeBASE's needs. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2011-08-13 15:26:32
|
Thanks Arlin. Indeed, this is a big issue. I'd say that there are two major sub-issues: 1. Taxon label consistency among objects within a submission/study. I gather that this is mostly what Arlin et al.'s PPT was addressing: if the set of taxon labels in the alignment don't match with the tree(s), users can't do much with the data until this is fixed. Some minor comments to add: A- One source of the error comes from different programs having different levels of compliance with NEXUS format. For example, open your tree file in Dendroscope and then save it and you'll find that the rules regarding illegal punctuation and underscore usage will have changed, creating a mismatch with the original NEXUS alignment. Likewise, MacClade automatically converts *all* underscores to spaces even if they are single quoted, whereas Mesquite "hard codes" underscores if the token as single quotes around it. Like Dendroscope, Archaeopteryx saves Newick and NEXUS trees so that the the labels change (Christian wasn't aware of the arcane tokenization rules -- we just recently discussed this with him, so this may be fixed soon). This does call for smart algorithms that can read improperly tokenized files (i.e. the "relaxed" setting in PAUP) -- which is tough, seeing as the program has to guess at the meaning of "," or "(" in a Newick string -- is it a new node or a token that was not quoted? And it calls for the ability to synonymize as needed, e.g. automatically recognizing that 'Homo_sapiens_x-2' in one file = 'Homo sapiens x-2' in another file. B- Mismatches sometimes arise when users try to indicate the Genbank accession numbers for separate locus alignments, but the tree is the result of simultaneous analysis. i.e., one alignment will use "Homo_sapiens_AJ23423", another uses "Homo_sapiens_AJ564667", and the tree uses "Homo_sapiens". It's laudable that they want to include this valuable metadata, but it would be better to code it as metadata in a NeXML file. And this calls for easy-to-use NeXML editors. e.g. add the ability to enter Genbank accession numbers in Mesquite, and then save as NeXML, thus preserving "Homo_sapiens" consistently in all alignments and resulting trees, while still communicating the respective accession numbers for each locus. Summer-of-Code project here. C- The basic data model of matrix-rows-matching-with-tree-OTUs works for 99% of datasets, but a growing number of studies use BEAST species inference (and other similar methods) where the tree ends in species OTUs, but the alignment has many more haplotype OTUs. -- i.e. there is, on purpose, a complete mismatch between alignment row labels and tree OTUs. Mesquite can handle this using a taxon association table, though I don't know that this is formal NEXUS or just a Mesquite invention. I don't think that NeXML or PhyloML can handle this. This calls for expanding the capabilities of NeXML and PhyloML. 2. Taxon labels not mapped or not mappable to external authorities or standards. This issue is not really the focus of Arlin et al's PPT, but is what Brian was addressing below. Yet it's equally important for data sharing, if not more so. Some comments: A- Until taxon concepts are truly identifiable/citable, the mapping of taxon labels to "taxa" will always be imprecise (with precise taxonomic circumscription, usage, and meaning epistemologically impossible to communicate), but at least gross homonyms need to be addressed. This is a challenge for automated services -- the iPlant TNRS has some advantage given that it does not (yet) include animal or bacterial names, but even within a code there are inter-rank homonyms (e.g. "Drosophila" the genus or subgenus?). A "smart" service would resolve the gross homonym based on the topology of the submitted tree -- i.e. ((Aotus,Homo),Lemur) should cause the service to pick Aotus the monkey instead of Aotus the Eudicot. B- Abbreviations in the taxon labels make it very difficult to do a smart TNRS lookup. Some of the examples of "resolved" labels in the PPT are nonetheless unacceptable with respect to TNRS resolution. Even something as ubiquitous as "E. coli" could refer to (or be confused with) Entamoeba coli (Grassi, 1879) instead of Escherichia coli (Migula 1895). C- Another source of Homonym is with virus names. This is a big problem for TreeBASE because TreeBASE's semi-automated name service starts by ignoring trailing strings that start with capital letters or that contain numbers -- e.g. the assumption is that the third part of "Homo_sapiens_AJ23423" is not part of the name, whereas the third part of "Homo_sapiens_sapiens" is part of the name. Yet, while "Neodiprion abietis" is a sawfly, "Neodiprion abietis NPV" is a gammabaculovirus that happens to infect the sawfly -- naturally, TreeBASE first tries to match the beginning part of the virus name to the host name, and the submitter needs to be sharp enough to notice and correct the problem. I'm going to guess that iPlant's TNRS will map "Ammi majus latent virus" to bishop's-weed, A. majus instead of to a Potyvirus. bp On Aug 13, 2011, at 2:19 AM, Brian O'Meara wrote: > I agree that name matching is a problem. There is some recent work that might be of interest: > > iPlant has done something similar to do just the name match up between two files in their discovery environment. Select a data file and a tree file, and it will find the names that match and then present the remainder to allow manual matching (there was talk of using fuzzy matching to get good preliminary guesses, but I don't know if that's implemented yet). It has a very similar interface to the one outlined in the slides. > > However, the long term solution might be automatic name matching. For 30 taxa, doing fuzzy match with user curation can work, but there are now trees with tens of thousands of taxa. Having the names in two different files matched to a standard taxonomy [sadly, one has to say "a standard taxonomy" rather than "the standard taxonomy"] will allow them to paired together as well as connect to existing information. There's a fairly new tool at http://tnrs.iplantcollaborative.org/ that does much of this now. It takes a list of plant names and matches it to a set of names from the Tropicos database. It can correct typos in names, deal with changes in taxonomy [something being moved to a different genus], etc. Due to its current database, it's limited to plants, but it's supposed to be written so that someone else can substitute a different names database. You can set it to automatically select the best match or return a set of possible matches. It also has an API that is pretty easy to use: I wrote a function to call it from within R to convert names on a phylogeny to standardized names (see code here) and it worked on a tree of 50K species. > > Brian > > > _______________________________________ > Brian O'Meara > Assistant Professor > Dept. of Ecology & Evolutionary Biology > U. of Tennessee, Knoxville > http://www.brianomeara.info > > Students wanted: Applications due Dec. 15, annually > Postdoc collaborators wanted: Check NIMBioS' website > Funding wanted: Want to collaborate on a grant? > > > On Fri, Aug 12, 2011 at 1:43 PM, Arlin Stoltzfus <ar...@um...> wrote: > Dear all-- > > A common problem with data sharing in phylogenetics is that OTU names do not match between files, e.g., between the alignment and the tree from the same study. I think I heard it from Bill that this is a common problem in TreeBASE submissions. I have encountered it many times and have thought about how to design software to deal with the problem. > > After discussing this with Vivek, I decided to make a more formal description of the problem which is available here (sorry about the pptx format): > > http://dl.dropbox.com/u/7727158/name_matching.pptx > > This includes real examples of mismatched names collected in the wild, an explanation of why the problem occurs, mock-ups of interactive user sessions, and implementation notes. Vivek already started playing with some of the concepts and put an app on appspot (the link is in the presentation). > > Comments are welcome. If implemented as described, how well would this tool serve the community need for name-matching? What would make it better? > > Arlin > ------- > Arlin Stoltzfus (ar...@um...) > Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST > IBBR, 9600 Gudelsky Drive, Rockville, MD > tel: 240 314 6208; web: www.molevol.org > > -- > You received this message because you are subscribed to the Google > Groups "MIAPA" group. > For more options, visit this group at > http://groups.google.com/group/miapa-discuss?hl=en > > > -- > You received this message because you are subscribed to the Google > Groups "MIAPA" group. > For more options, visit this group at > http://groups.google.com/group/miapa-discuss?hl=en |
From: Brian O'M. <bo...@ut...> - 2011-08-13 06:27:20
|
I agree that name matching is a problem. There is some recent work that might be of interest: iPlant has done something similar to do just the name match up between two files in their discovery environment. Select a data file and a tree file, and it will find the names that match and then present the remainder to allow manual matching (there was talk of using fuzzy matching to get good preliminary guesses, but I don't know if that's implemented yet). It has a very similar interface to the one outlined in the slides. However, the long term solution might be automatic name matching. For 30 taxa, doing fuzzy match with user curation can work, but there are now trees with tens of thousands of taxa. Having the names in two different files matched to a standard taxonomy [sadly, one has to say "a standard taxonomy" rather than "the standard taxonomy"] will allow them to paired together as well as connect to existing information. There's a fairly new tool at http://tnrs.iplantcollaborative.org/ that does much of this now. It takes a list of plant names and matches it to a set of names from the Tropicos database. It can correct typos in names, deal with changes in taxonomy [something being moved to a different genus], etc. Due to its current database, it's limited to plants, but it's supposed to be written so that someone else can substitute a different names database. You can set it to automatically select the best match or return a set of possible matches. It also has an API that is pretty easy to use: I wrote a function to call it from within R to convert names on a phylogeny to standardized names (see code here<https://r-forge.r-project.org/scm/viewvc.php/pkg/R/resolveNames.R?view=markup&revision=180&root=omearalab>) and it worked on a tree of 50K species. Brian _______________________________________ Brian O'Meara Assistant Professor Dept. of Ecology & Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Funding wanted: Want to collaborate on a grant? On Fri, Aug 12, 2011 at 1:43 PM, Arlin Stoltzfus <ar...@um...> wrote: > Dear all-- > > A common problem with data sharing in phylogenetics is that OTU names do > not match between files, e.g., between the alignment and the tree from the > same study. I think I heard it from Bill that this is a common problem in > TreeBASE submissions. I have encountered it many times and have thought > about how to design software to deal with the problem. > > After discussing this with Vivek, I decided to make a more formal > description of the problem which is available here (sorry about the pptx > format): > > http://dl.dropbox.com/u/**7727158/name_matching.pptx<http://dl.dropbox.com/u/7727158/name_matching.pptx> > > This includes real examples of mismatched names collected in the wild, an > explanation of why the problem occurs, mock-ups of interactive user > sessions, and implementation notes. Vivek already started playing with some > of the concepts and put an app on appspot (the link is in the presentation). > > Comments are welcome. If implemented as described, how well would this > tool serve the community need for name-matching? What would make it better? > > Arlin > ------- > Arlin Stoltzfus (ar...@um...) > Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST > IBBR, 9600 Gudelsky Drive, Rockville, MD > tel: 240 314 6208; web: www.molevol.org > > -- > You received this message because you are subscribed to the Google > Groups "MIAPA" group. > For more options, visit this group at > http://groups.google.com/**group/miapa-discuss?hl=en<http://groups.google.com/group/miapa-discuss?hl=en> > |
From: Arlin S. <ar...@um...> - 2011-08-12 17:44:01
|
Dear all-- A common problem with data sharing in phylogenetics is that OTU names do not match between files, e.g., between the alignment and the tree from the same study. I think I heard it from Bill that this is a common problem in TreeBASE submissions. I have encountered it many times and have thought about how to design software to deal with the problem. After discussing this with Vivek, I decided to make a more formal description of the problem which is available here (sorry about the pptx format): http://dl.dropbox.com/u/7727158/name_matching.pptx This includes real examples of mismatched names collected in the wild, an explanation of why the problem occurs, mock-ups of interactive user sessions, and implementation notes. Vivek already started playing with some of the concepts and put an app on appspot (the link is in the presentation). Comments are welcome. If implemented as described, how well would this tool serve the community need for name-matching? What would make it better? Arlin ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |
From: Shyket, H. <har...@ya...> - 2011-07-29 14:01:42
|
Hi David, Was it an issue with the code or the deployment? Thanks, Harry Shyket Digital Media Specialist Yale University Peabody Museum ph. 203-436-9428 har...@ya... From: David Palmer [mailto:dav...@du...] Sent: Friday, July 29, 2011 10:00 AM To: tre...@li... Cc: Mattison Ward (mw225) Subject: Re: [Treebase-devel] Scheduling Production Build We have manually deployed the latest version from Development to Production. Let us know if you have questions. Thanks, David From: David Palmer Sent: Friday, July 29, 2011 9:06 AM To: 'tre...@li...' Cc: 'Mattison Ward (mw225)'; Hilmar Lapp Subject: RE: Scheduling Production Build We have restored the state prior to the upgrade. The TreeBASE production system is again online. Thanks, David From: David Palmer Sent: Friday, July 29, 2011 8:50 AM To: tre...@li...<mailto:tre...@li...> Cc: Mattison Ward (mw225); Hilmar Lapp Subject: RE: Scheduling Production Build The production TreeBASE build failed this morning. We are actively working on restoring services to TreeBASE production. Thanks, David From: David Palmer Sent: Thursday, July 28, 2011 11:08 AM To: tre...@li...<mailto:tre...@li...> Cc: Mattison Ward (mw225) Subject: Scheduling Production Build A request has been made to re-build the Production instance of TreeBASE. This update is being schedule for Friday, July 28, 2011. Please let me know if you have questions/concerns. Thanks, David Palmer NESCent 919 668 6520 Thu Jul 28 00:22:51 2011 William Piel - Ticket created [Reply<http://help.nescent.org/Ticket/Update.html?id=11506&QuoteTransaction=215965&Action=Respond>] [Comment<http://help.nescent.org/Ticket/Update.html?id=11506&QuoteTransaction=215965&Action=Comment>][Forward<http://help.nescent.org/Ticket/Forward.html?id=11506&QuoteTransaction=215965>] Subject: Important bug fix for TreeBASE Date: Thu, 28 Jul 2011 00:22:46 -0400 To: "NESCent Issue Tracker (help)" <he...@ne...<mailto:he...@ne...>> From: William Piel <wil...@ya...<mailto:wil...@ya...>> Download (untitled)<http://help.nescent.org/Ticket/Attachment/215965/342736/> / with headers<http://help.nescent.org/Ticket/Attachment/WithHeaders/342736> text/plain 309b Hi, Harry made an important fix to revision 934. The bug is preventing people from removing trees from analyses, and it also seems capable of corrupting data. I've tested Harry's fix on dev and it seems to solve the problem. So if we could have a push to production, that would be great. thanks, Bill |
From: David P. <dav...@du...> - 2011-07-29 14:00:18
|
We have manually deployed the latest version from Development to Production. Let us know if you have questions. Thanks, David From: David Palmer Sent: Friday, July 29, 2011 9:06 AM To: 'tre...@li...' Cc: 'Mattison Ward (mw225)'; Hilmar Lapp Subject: RE: Scheduling Production Build We have restored the state prior to the upgrade. The TreeBASE production system is again online. Thanks, David From: David Palmer Sent: Friday, July 29, 2011 8:50 AM To: tre...@li...<mailto:tre...@li...> Cc: Mattison Ward (mw225); Hilmar Lapp Subject: RE: Scheduling Production Build The production TreeBASE build failed this morning. We are actively working on restoring services to TreeBASE production. Thanks, David From: David Palmer Sent: Thursday, July 28, 2011 11:08 AM To: tre...@li...<mailto:tre...@li...> Cc: Mattison Ward (mw225) Subject: Scheduling Production Build A request has been made to re-build the Production instance of TreeBASE. This update is being schedule for Friday, July 28, 2011. Please let me know if you have questions/concerns. Thanks, David Palmer NESCent 919 668 6520 Thu Jul 28 00:22:51 2011 William Piel - Ticket created [Reply<http://help.nescent.org/Ticket/Update.html?id=11506&QuoteTransaction=215965&Action=Respond>] [Comment<http://help.nescent.org/Ticket/Update.html?id=11506&QuoteTransaction=215965&Action=Comment>][Forward<http://help.nescent.org/Ticket/Forward.html?id=11506&QuoteTransaction=215965>] Subject: Important bug fix for TreeBASE Date: Thu, 28 Jul 2011 00:22:46 -0400 To: "NESCent Issue Tracker (help)" <he...@ne...<mailto:he...@ne...>> From: William Piel <wil...@ya...<mailto:wil...@ya...>> Download (untitled)<http://help.nescent.org/Ticket/Attachment/215965/342736/> / with headers<http://help.nescent.org/Ticket/Attachment/WithHeaders/342736> text/plain 309b Hi, Harry made an important fix to revision 934. The bug is preventing people from removing trees from analyses, and it also seems capable of corrupting data. I've tested Harry's fix on dev and it seems to solve the problem. So if we could have a push to production, that would be great. thanks, Bill |
From: David P. <dav...@du...> - 2011-07-29 13:05:57
|
We have restored the state prior to the upgrade. The TreeBASE production system is again online. Thanks, David From: David Palmer Sent: Friday, July 29, 2011 8:50 AM To: tre...@li... Cc: Mattison Ward (mw225); Hilmar Lapp Subject: RE: Scheduling Production Build The production TreeBASE build failed this morning. We are actively working on restoring services to TreeBASE production. Thanks, David From: David Palmer Sent: Thursday, July 28, 2011 11:08 AM To: tre...@li...<mailto:tre...@li...> Cc: Mattison Ward (mw225) Subject: Scheduling Production Build A request has been made to re-build the Production instance of TreeBASE. This update is being schedule for Friday, July 28, 2011. Please let me know if you have questions/concerns. Thanks, David Palmer NESCent 919 668 6520 Thu Jul 28 00:22:51 2011 William Piel - Ticket created [Reply<http://help.nescent.org/Ticket/Update.html?id=11506&QuoteTransaction=215965&Action=Respond>] [Comment<http://help.nescent.org/Ticket/Update.html?id=11506&QuoteTransaction=215965&Action=Comment>][Forward<http://help.nescent.org/Ticket/Forward.html?id=11506&QuoteTransaction=215965>] Subject: Important bug fix for TreeBASE Date: Thu, 28 Jul 2011 00:22:46 -0400 To: "NESCent Issue Tracker (help)" <he...@ne...<mailto:he...@ne...>> From: William Piel <wil...@ya...<mailto:wil...@ya...>> Download (untitled)<http://help.nescent.org/Ticket/Attachment/215965/342736/> / with headers<http://help.nescent.org/Ticket/Attachment/WithHeaders/342736> text/plain 309b Hi, Harry made an important fix to revision 934. The bug is preventing people from removing trees from analyses, and it also seems capable of corrupting data. I've tested Harry's fix on dev and it seems to solve the problem. So if we could have a push to production, that would be great. thanks, Bill |
From: David P. <dav...@du...> - 2011-07-29 12:49:42
|
The production TreeBASE build failed this morning. We are actively working on restoring services to TreeBASE production. Thanks, David From: David Palmer Sent: Thursday, July 28, 2011 11:08 AM To: tre...@li... Cc: Mattison Ward (mw225) Subject: Scheduling Production Build A request has been made to re-build the Production instance of TreeBASE. This update is being schedule for Friday, July 28, 2011. Please let me know if you have questions/concerns. Thanks, David Palmer NESCent 919 668 6520 Thu Jul 28 00:22:51 2011 William Piel - Ticket created [Reply<http://help.nescent.org/Ticket/Update.html?id=11506&QuoteTransaction=215965&Action=Respond>] [Comment<http://help.nescent.org/Ticket/Update.html?id=11506&QuoteTransaction=215965&Action=Comment>][Forward<http://help.nescent.org/Ticket/Forward.html?id=11506&QuoteTransaction=215965>] Subject: Important bug fix for TreeBASE Date: Thu, 28 Jul 2011 00:22:46 -0400 To: "NESCent Issue Tracker (help)" <he...@ne...<mailto:he...@ne...>> From: William Piel <wil...@ya...<mailto:wil...@ya...>> Download (untitled)<http://help.nescent.org/Ticket/Attachment/215965/342736/> / with headers<http://help.nescent.org/Ticket/Attachment/WithHeaders/342736> text/plain 309b Hi, Harry made an important fix to revision 934. The bug is preventing people from removing trees from analyses, and it also seems capable of corrupting data. I've tested Harry's fix on dev and it seems to solve the problem. So if we could have a push to production, that would be great. thanks, Bill |
From: David P. <dav...@ne...> - 2011-07-28 15:29:30
|
A request has been made to re-build the Production instance of TreeBASE. This update is being schedule for Friday, July 28, 2011. Please let me know if you have questions/concerns. Thanks, David Palmer NESCent 919 668 6520 Thu Jul 28 00:22:51 2011 William Piel - Ticket created [Reply<http://help.nescent.org/Ticket/Update.html?id=11506&QuoteTransaction=215965&Action=Respond>] [Comment<http://help.nescent.org/Ticket/Update.html?id=11506&QuoteTransaction=215965&Action=Comment>][Forward<http://help.nescent.org/Ticket/Forward.html?id=11506&QuoteTransaction=215965>] Subject: Important bug fix for TreeBASE Date: Thu, 28 Jul 2011 00:22:46 -0400 To: "NESCent Issue Tracker (help)" <he...@ne...> From: William Piel <wil...@ya...> Download (untitled)<http://help.nescent.org/Ticket/Attachment/215965/342736/> / with headers<http://help.nescent.org/Ticket/Attachment/WithHeaders/342736> text/plain 309b Hi, Harry made an important fix to revision 934. The bug is preventing people from removing trees from analyses, and it also seems capable of corrupting data. I've tested Harry's fix on dev and it seems to solve the problem. So if we could have a push to production, that would be great. thanks, Bill |
From: Enrico P. <epo...@cs...> - 2011-07-13 14:25:25
|
I agree. Some of the components are taking shape or are already well defined (MIAPA 0.1/1.0, Nexml). I hope that the standard record will have a semantic foundation (let it be MIAPA?). It would be good to start formulating the structure of such annotated record and perhaps move forward the creation of a PhyloWS service that at least, as a start, provides validation (and possibly completion) of the record. Enrico -- Dept. Computer Science, New Mexico State University MSC CS, Box 30001, Las Cruces, NM 88003 Voice: 575-646-6239 Fax: 575-646-1002 On 7/13/11 8:00 AM, "Arlin Stoltzfus" <ar...@um...> wrote: >I think we should consider some ways to publicize elements of the >approaches we all have been advocating in various ways, which depends >on open standards and web services and so on, e.g., the idea that any >tool should be able to submit an annotated record to an archive using >a standard protocol. There might be other people around the world who >would cooperate with us if they knew what we were thinking. Of course >we are discussing these things on public lists, but still I would >guess that many people who would be interested in this have no idea >what we are considering. > >Arlin > >On Jun 14, 2011, at 9:28 AM, Rutger Vos wrote: > >> I just shared some sketches (thinking out loud about TreeBASE3) with >> various people on these lists (Arlin, Hilmar, Karen, Bill, Harry). >> MIAPA might play a role in the automated submission process, so if >> anyone else is interested in seeing these documents please let me or >> any of the other people with access now and we can share it with you. >> >> On Mon, Jun 13, 2011 at 1:22 PM, Arlin Stoltzfus <ar...@um...> >> wrote: >>> After our telecon, which suggested that splitting out the MIAPA >>> part was a better strategy, I started a separate doc for this here: >>> >>> >>>https://docs.google.com/document/d/16bno1sB3gBHHnew5TnoCLawScuoydG-i5LCP >>>cB30OZY/edit?hl=en_US >>> >>> The focus of this, as currently conceived, is to combine problem- >>> solving with development of a draft standard. The problem-solving >>> attempts to address relevant user needs (e.g., helping users to >>> create a properly formatted and annotated archive submission). >>> This way, we will be developing technology support at the same time >>> as the draft standard (which, ideally, will encourage the broader >>> community to try it out and work with us). >>> >>> If you are interested, please take a look at the proposal, help us >>> to identify problems to address and possible strategies to address >>> them by leveraging available technologies and resources. Those who >>> are interested will need to solidify partnerships as soon as >>> possible, as there is only a month left to formulate the plan and >>> write the proposal. >>> >>> Arlin >>> ________________________________________ >>> From: mia...@go... [miapa- >>> di...@go...] On Behalf Of Karen Cranston >>>[kar...@ne... >>> ] >>> Sent: Thursday, June 09, 2011 12:58 PM >>> To: ph...@go...; MIAPA; TreeBASE devel >>> Subject: Re: ABI proposal for phyloinformatics >>> >>> Hilmar and I talked to Anne Maglia from NSF this morning. The notes >>> are on the "Pitches for TreeBASE_ABI" document (which is now editable >>> by anyone with the link, BTW). She did not see any major issues and >>> had plenty of advise on how to avoid common pitfalls when writing for >>> the ABI panel. >>> >>> Summary: >>> 1. Making the MIAPA component into a separate Innovation proposal is >>> probably a good idea. >>> 2. The TreeBASE / ToLWeb piece is well-suited for a Development >>> proposal, and we can discuss MIAPA in this proposal as long as we >>> have >>> a concrete contingency plan for the possibility that this gets funded >>> and the MIAPA proposal does not. >>> 3. There is no general rule about incremental improvement vs major >>> re-engineering, but the goals of the proposal must be novel in some >>> way and have intellectual merit. A re-engineering proposal could be >>> computationally novel, while a proposal with only incremental >>> improvements must instead have novel interface components or strong >>> biological motivations. >>> 4. There seems to be an empty niche for proposals that include novel >>> front-end as well as back-end development, but we need to make sure >>> we >>> have the appropriate expertise for the former. >>> 5. She suggests sharing the draft with someone from BIO (perhaps >>> Maureen Kearney) to get the user community perspective >>> >>> Please fill out the doodle poll so that we can plan the next course >>> of action! >>> >>> Cheers, >>> Karen >>> >>> On Wed, Jun 8, 2011 at 4:24 PM, Hilmar Lapp <hl...@ne...> >>> wrote: >>>> It looks like a response from NSF is still pending. There is not a >>>> lot of >>>> time left until the submission deadline, and I'll be out of >>>> commission for >>>> at least 7 days during that time starting Wed next week. So I >>>> suggest we >>>> start planning and get together independently of the NSF response >>>> to hash >>>> out over a conference call possible contributions and commitments. >>>> Here's a >>>> Doodle poll for scheduling. >>>> >>>> http://www.doodle.com/8zvwbidtxm9gzxcp >>>> >>>> To make sure that we can have a relatively targeted discussion, my >>>> suggestion would be that everyone who is willing to play a role in >>>> this >>>> proposal enter their availability, and come prepared for the >>>> following >>>> questions: >>>> >>>> 1. What aims would a proposal need to have to for you to commit to >>>> be part >>>> of it, and conversely, what aims should it not have. (Ideally, the >>>> aims >>>> would be from either pitch A or pitch B that Karen sent to NSF for >>>> feedback.) >>>> >>>> 2. What aims, expertise, and partners are we missing from the >>>> group. Do you >>>> have suggestions for how to pull those in. >>>> >>>> 3. What role are you interested in playing, for which aim(s). What >>>> kind and >>>> how many resources do you anticipate requiring support for to >>>> accomplish >>>> those aims. >>>> >>>> At the end of this, ideally we have a concrete sense for whether >>>> there are >>>> 0, 1, or 2 proposals that are viably going to come together, what >>>> size of >>>> proposal(s) we are talking about, who would take responsibility >>>> for what, >>>> and who else we need to reach out to. >>>> >>>> Comments / suggestions / additional items for the enumeration >>>> above welcome. >>>> >>>> -hilmar >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : >>>> =========================================================== >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Karen Cranston, PhD >>> Training Coordinator and Informatics Project Manager >>> nescent.org >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "MIAPA" group. >>> For more options, visit this group at >>> http://groups.google.com/group/miapa-discuss?hl=en >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "MIAPA" group. >>> For more options, visit this group at >>> http://groups.google.com/group/miapa-discuss?hl=en >>> >> >> >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading, RG6 6BX, United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://rutgervos.blogspot.com > >------- >Arlin Stoltzfus (ar...@um...) >Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST >IBBR, 9600 Gudelsky Drive, Rockville, MD >tel: 240 314 6208; web: www.molevol.org > >-- >You received this message because you are subscribed to the Google >Groups "MIAPA" group. >For more options, visit this group at >http://groups.google.com/group/miapa-discuss?hl=en |
From: Arlin S. <ar...@um...> - 2011-07-13 14:00:39
|
I think we should consider some ways to publicize elements of the approaches we all have been advocating in various ways, which depends on open standards and web services and so on, e.g., the idea that any tool should be able to submit an annotated record to an archive using a standard protocol. There might be other people around the world who would cooperate with us if they knew what we were thinking. Of course we are discussing these things on public lists, but still I would guess that many people who would be interested in this have no idea what we are considering. Arlin On Jun 14, 2011, at 9:28 AM, Rutger Vos wrote: > I just shared some sketches (thinking out loud about TreeBASE3) with > various people on these lists (Arlin, Hilmar, Karen, Bill, Harry). > MIAPA might play a role in the automated submission process, so if > anyone else is interested in seeing these documents please let me or > any of the other people with access now and we can share it with you. > > On Mon, Jun 13, 2011 at 1:22 PM, Arlin Stoltzfus <ar...@um...> > wrote: >> After our telecon, which suggested that splitting out the MIAPA >> part was a better strategy, I started a separate doc for this here: >> >> https://docs.google.com/document/d/16bno1sB3gBHHnew5TnoCLawScuoydG-i5LCPcB30OZY/edit?hl=en_US >> >> The focus of this, as currently conceived, is to combine problem- >> solving with development of a draft standard. The problem-solving >> attempts to address relevant user needs (e.g., helping users to >> create a properly formatted and annotated archive submission). >> This way, we will be developing technology support at the same time >> as the draft standard (which, ideally, will encourage the broader >> community to try it out and work with us). >> >> If you are interested, please take a look at the proposal, help us >> to identify problems to address and possible strategies to address >> them by leveraging available technologies and resources. Those who >> are interested will need to solidify partnerships as soon as >> possible, as there is only a month left to formulate the plan and >> write the proposal. >> >> Arlin >> ________________________________________ >> From: mia...@go... [miapa- >> di...@go...] On Behalf Of Karen Cranston [kar...@ne... >> ] >> Sent: Thursday, June 09, 2011 12:58 PM >> To: ph...@go...; MIAPA; TreeBASE devel >> Subject: Re: ABI proposal for phyloinformatics >> >> Hilmar and I talked to Anne Maglia from NSF this morning. The notes >> are on the "Pitches for TreeBASE_ABI" document (which is now editable >> by anyone with the link, BTW). She did not see any major issues and >> had plenty of advise on how to avoid common pitfalls when writing for >> the ABI panel. >> >> Summary: >> 1. Making the MIAPA component into a separate Innovation proposal is >> probably a good idea. >> 2. The TreeBASE / ToLWeb piece is well-suited for a Development >> proposal, and we can discuss MIAPA in this proposal as long as we >> have >> a concrete contingency plan for the possibility that this gets funded >> and the MIAPA proposal does not. >> 3. There is no general rule about incremental improvement vs major >> re-engineering, but the goals of the proposal must be novel in some >> way and have intellectual merit. A re-engineering proposal could be >> computationally novel, while a proposal with only incremental >> improvements must instead have novel interface components or strong >> biological motivations. >> 4. There seems to be an empty niche for proposals that include novel >> front-end as well as back-end development, but we need to make sure >> we >> have the appropriate expertise for the former. >> 5. She suggests sharing the draft with someone from BIO (perhaps >> Maureen Kearney) to get the user community perspective >> >> Please fill out the doodle poll so that we can plan the next course >> of action! >> >> Cheers, >> Karen >> >> On Wed, Jun 8, 2011 at 4:24 PM, Hilmar Lapp <hl...@ne...> >> wrote: >>> It looks like a response from NSF is still pending. There is not a >>> lot of >>> time left until the submission deadline, and I'll be out of >>> commission for >>> at least 7 days during that time starting Wed next week. So I >>> suggest we >>> start planning and get together independently of the NSF response >>> to hash >>> out over a conference call possible contributions and commitments. >>> Here's a >>> Doodle poll for scheduling. >>> >>> http://www.doodle.com/8zvwbidtxm9gzxcp >>> >>> To make sure that we can have a relatively targeted discussion, my >>> suggestion would be that everyone who is willing to play a role in >>> this >>> proposal enter their availability, and come prepared for the >>> following >>> questions: >>> >>> 1. What aims would a proposal need to have to for you to commit to >>> be part >>> of it, and conversely, what aims should it not have. (Ideally, the >>> aims >>> would be from either pitch A or pitch B that Karen sent to NSF for >>> feedback.) >>> >>> 2. What aims, expertise, and partners are we missing from the >>> group. Do you >>> have suggestions for how to pull those in. >>> >>> 3. What role are you interested in playing, for which aim(s). What >>> kind and >>> how many resources do you anticipate requiring support for to >>> accomplish >>> those aims. >>> >>> At the end of this, ideally we have a concrete sense for whether >>> there are >>> 0, 1, or 2 proposals that are viably going to come together, what >>> size of >>> proposal(s) we are talking about, who would take responsibility >>> for what, >>> and who else we need to reach out to. >>> >>> Comments / suggestions / additional items for the enumeration >>> above welcome. >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : >>> =========================================================== >>> >>> >>> >>> >> >> >> >> -- >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Karen Cranston, PhD >> Training Coordinator and Informatics Project Manager >> nescent.org >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> -- >> You received this message because you are subscribed to the Google >> Groups "MIAPA" group. >> For more options, visit this group at >> http://groups.google.com/group/miapa-discuss?hl=en >> >> -- >> You received this message because you are subscribed to the Google >> Groups "MIAPA" group. >> For more options, visit this group at >> http://groups.google.com/group/miapa-discuss?hl=en >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading, RG6 6BX, United Kingdom > Tel: +44 (0) 118 378 7535 > http://rutgervos.blogspot.com ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |
From: Hilmar L. <hl...@ne...> - 2011-06-24 18:26:47
|
Laurel - you need to set your Sf.net account to accept incoming email. The default is that it does not. You can have it forward to one of our existing email addresses. -hilmar On Jun 24, 2011, at 2:05 PM, David Palmer wrote: > Forwarding bounced email: > > From: Mail Delivery Subsystem <MAI...@pa...> > Date: Fri, 24 Jun 2011 13:12:36 -0400 > To: David Palmer <dav...@ne...> > Subject: Undeliverable: Build failed in Hudson: Treebase-dev #184 > > Delivery has failed to these recipients or distribution lists: > lol...@us... > The recipient's e-mail address was not found in the recipient's e- > mail system. Microsoft Exchange will not try to redeliver this > message for you. Please check the e-mail address and try resending > this message, or provide the following diagnostic text to your > system administrator. > The following organization rejected your message: mx.sourceforge.net. > > > > > > > Diagnostic information for administrators: > Generating server: pakicetus.nescent.org > lol...@us... > mx.sourceforge.net #<mx.sourceforge.net #5.1.1 SMTP; 550 unknown > user> #SMTP# > Original message headers: > Return-Path: <it...@ne...> > Received: from pakicetus.nescent.org (localhost.localdomain > [127.0.0.1]) by pakicetus.nescent.org (8.13.8/8.13.8) with ESMTP id > p5OHCPp7017159; Fri, 24 > Jun 2011 13:12:25 -0400 > Date: Fri, 24 Jun 2011 13:12:25 -0400 > From: IT Admin <it...@ne...> > To: <vga...@ne...>, <lol...@us...> > Message-ID: <724...@pa... > > > Subject: Build failed in Hudson: Treebase-dev #184 > MIME-Version: 1.0 > Content-Type: text/plain; charset="UTF-8" > Content-Transfer-Encoding: 7bit > > From: IT Admin <it...@ne...> > Date: June 24, 2011 1:12:25 PM EDT > To: "vga...@ne..." <vga...@ne...>, "lol...@us... > " <lol...@us...> > Subject: Build failed in Hudson: Treebase-dev #184 > > > See <http://hudson.nescent.org/job/Treebase-dev/184/changes> > > Changes: > > [loloyohe] Modified NexmlMatrixConverter() fromTreeBaseToXml() > methods so that they have charset functionality based on the logic > from the unit tests in NexmlMatrixConverterTest. > > ------------------------------------------ > Started by an SCM change > Updating https://treebase.svn.sourceforge.net/svnroot/treebase/trunk > U treebase-core/src/main/java/org/cipres/treebase/domain/ > nexus/nexml/NexmlMatrixConverter.java > At revision 923 > [trunk] $ /home/hudson/tools/Maven-2.2.1/bin/mvn clean package - > Dmaven.test.skip=true > [INFO] Scanning for projects... > [INFO] Reactor build order: > [INFO] Treebase > [INFO] treebase-core > [INFO] treebase-web > [INFO] > ------------------------------------------------------------------------ > [INFO] Building Treebase > [INFO] task-segment: [clean, package] > [INFO] > ------------------------------------------------------------------------ > [INFO] [clean:clean {execution: default-clean}] > [INFO] [site:attach-descriptor {execution: default-attach-descriptor}] > [INFO] > ------------------------------------------------------------------------ > [INFO] Building treebase-core > [INFO] task-segment: [clean, package] > [INFO] > ------------------------------------------------------------------------ > [INFO] [clean:clean {execution: default-clean}] > [INFO] Deleting directory <http://hudson.nescent.org/job/Treebase-dev/ws/trunk/treebase-core/target > > > [INFO] [resources:resources {execution: default-resources}] > [WARNING] Using platform encoding (UTF-8 actually) to copy filtered > resources, i.e. build is platform dependent! > [INFO] Copying 16 resources > [INFO] snapshot org.nexml.model:nexml:1.5-SNAPSHOT: checking for > updates from repository.jboss.org > [WARNING] repository metadata for: 'snapshot org.nexml.model:nexml: > 1.5-SNAPSHOT' could not be retrieved from repository: > repository.jboss.org due to an error: Authorization failed: Access > denied to: http://repository.jboss.org/maven2/org/nexml/model/nexml/1.5-SNAPSHOT/maven-metadata.xml > [INFO] Repository 'repository.jboss.org' will be blacklisted > [INFO] snapshot org.nexml.model:nexml:1.5-SNAPSHOT: checking for > updates from maven2 > [INFO] snapshot org.nexml.model:nexml:1.5-SNAPSHOT: checking for > updates from m2.remote.repos > [INFO] snapshot org.nexml.model:nexml:1.5-SNAPSHOT: checking for > updates from m2.nexml.repos > Downloading: http://repo1.maven.org/maven2/org/nexml/model/nexml/1.5-SNAPSHOT/nexml-1.5-SNAPSHOT.pom > [INFO] Unable to find resource 'org.nexml.model:nexml:pom:1.5- > SNAPSHOT' in repository maven2 (http://repo1.maven.org/maven2) > Downloading: http://treebase.sourceforge.net/maven2/org/nexml/model/nexml/1.5-SNAPSHOT/nexml-1.5-SNAPSHOT.pom > [INFO] Unable to find resource 'org.nexml.model:nexml:pom:1.5- > SNAPSHOT' in repository m2.remote.repos (http://treebase.sourceforge.net/maven2 > ) > Downloading: http://nexml-dev.nescent.org/.m2/repository/org/nexml/model/nexml/1.5-SNAPSHOT/nexml-1.5-SNAPSHOT.pom > [INFO] Unable to find resource 'org.nexml.model:nexml:pom:1.5- > SNAPSHOT' in repository m2.nexml.repos (http://nexml-dev.nescent.org/.m2/repository > ) > Downloading: http://repo1.maven.org/maven2/org/nexml/model/nexml/1.5-SNAPSHOT/nexml-1.5-SNAPSHOT.jar > [INFO] Unable to find resource 'org.nexml.model:nexml:jar:1.5- > SNAPSHOT' in repository maven2 (http://repo1.maven.org/maven2) > Downloading: http://treebase.sourceforge.net/maven2/org/nexml/model/nexml/1.5-SNAPSHOT/nexml-1.5-SNAPSHOT.jar > [INFO] Unable to find resource 'org.nexml.model:nexml:jar:1.5- > SNAPSHOT' in repository m2.remote.repos (http://treebase.sourceforge.net/maven2 > ) > Downloading: http://nexml-dev.nescent.org/.m2/repository/org/nexml/model/nexml/1.5-SNAPSHOT/nexml-1.5-SNAPSHOT.jar > [INFO] [compiler:compile {execution: default-compile}] > [INFO] Compiling 378 source files to <http://hudson.nescent.org/job/Treebase-dev/ws/trunk/treebase-core/target/classes > > > [INFO] > ------------------------------------------------------------------------ > [ERROR] BUILD FAILURE > [INFO] > ------------------------------------------------------------------------ > [INFO] Compilation failure > <http://hudson.nescent.org/job/Treebase-dev/ws/trunk/treebase-core/src/main/java/org/cipres/treebase/domain/nexus/nexml/NexmlMatrixConverter.java > >:[8,22] package junit.framework does not exist > > > [INFO] > ------------------------------------------------------------------------ > [INFO] For more information, run Maven with the -e switch > [INFO] > ------------------------------------------------------------------------ > [INFO] Total time: 29 seconds > [INFO] Finished at: Fri Jun 24 13:12:24 EDT 2011 > [INFO] Final Memory: 48M/330M > [INFO] > ------------------------------------------------------------------------ > > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and > makes > sense of it. Business sense. IT sense. Common sense.. > http://p.sf.net/sfu/splunk-d2d-c1_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: David P. v. R. <he...@ne...> - 2011-06-20 12:28:18
|
On Mon Jun 20 08:08:37 2011, david.palmer wrote: > Build #20 completed successfully at approximately 8:06 am on 6/20/11. Please > call me for the password for treebase-app. > > Thanks, David > 919 668 6520 > > On Sat Jun 18 17:49:27 2011, william.piel wrote: > > > > Please rebuild production at next convenience. > > > > Also, please send me the credentials for accessing the dev PostgreSQL > > database -- treebasedb-dev. > > > > thanks, > > > > Bill > > > > For more info: Ticket <URL: https://help.nescent.org/Ticket/Display.html?id=11301 > |
From: Shyket, H. <har...@ya...> - 2011-06-17 17:01:38
|
No issues here. Any bug fixes that need to tested on prod? Rutger Vos <R....@re...> wrote: No objections. On Fri, Jun 17, 2011 at 5:20 PM, William Piel <wil...@ya...> wrote: > > Does anyone object to a new production build today? > > Although the last one was 15 days ago, I think it's worth doing a new one today because Google is in the midst of indexing TreeBASE, and it was an oversight not to include the words "phylogeny" and "phylogenetic" together with each studies taxon list -- and this was just recently fixed. That way people googling a taxon name together with the word "phylogeny" will hit TreeBASE records. > > bp > > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Treebase-devel mailing list Tre...@li... https://lists.sourceforge.net/lists/listinfo/treebase-devel |
From: Rutger V. <R....@re...> - 2011-06-17 16:39:08
|
No objections. On Fri, Jun 17, 2011 at 5:20 PM, William Piel <wil...@ya...> wrote: > > Does anyone object to a new production build today? > > Although the last one was 15 days ago, I think it's worth doing a new one today because Google is in the midst of indexing TreeBASE, and it was an oversight not to include the words "phylogeny" and "phylogenetic" together with each studies taxon list -- and this was just recently fixed. That way people googling a taxon name together with the word "phylogeny" will hit TreeBASE records. > > bp > > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |