From: Hilmar L. <hl...@ne...> - 2011-04-14 01:36:40
|
The trees in S11267 seem to silently fail to render in PhyloWidget. All I get is a single dot. At least try the first three: http://www.treebase.org/treebase-web/search/study/trees.html?id=11267 Is this a temporary glitch, a problem with the reconstructed file, or something else? Should I best file this as a bug in the bug tracker? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: William P. <wil...@ya...> - 2011-04-14 03:20:18
|
On Apr 13, 2011, at 9:36 PM, Hilmar Lapp wrote: > The trees in S11267 seem to silently fail to render in PhyloWidget. > All I get is a single dot. At least try the first three: > > http://www.treebase.org/treebase-web/search/study/trees.html?id=11267 > > Is this a temporary glitch, a problem with the reconstructed file, or > something else? Should I best file this as a bug in the bug tracker? > > -hilmar > Thanks for the alert. This is actually a bug in PhyloWidget -- if the word "tree" appears in the title of the tree block, PhyloWidget confuses this word with the TREE command, and so goofs up the parsing. I have removed the word "tree" from the tree block, so it works now. However, you might need to refresh the PhyloWidget window after loading to get the tree to pull through -- TreeBASE feels quite slow lately; our API must be getting hit by someone (Rod?). bp |
From: Roderic P. <r....@bi...> - 2011-04-14 11:52:16
|
TreeBASE performance has nothing to do with me folks, I pretty much gave up trying to download data from it a few weeks back. Someone really, really needs to rethink the way TreeBASE works, because it's virtually unusable. Regards Rod On 14 Apr 2011, at 04:20, William Piel wrote: > > On Apr 13, 2011, at 9:36 PM, Hilmar Lapp wrote: > >> The trees in S11267 seem to silently fail to render in PhyloWidget. >> All I get is a single dot. At least try the first three: >> >> http://www.treebase.org/treebase-web/search/study/trees.html?id=11267 >> >> Is this a temporary glitch, a problem with the reconstructed file, or >> something else? Should I best file this as a bug in the bug tracker? >> >> -hilmar >> > > Thanks for the alert. This is actually a bug in PhyloWidget -- if the word "tree" appears in the title of the tree block, PhyloWidget confuses this word with the TREE command, and so goofs up the parsing. > > I have removed the word "tree" from the tree block, so it works now. > > However, you might need to refresh the PhyloWidget window after loading to get the tree to pull through -- TreeBASE feels quite slow lately; our API must be getting hit by someone (Rod?). > > bp > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |
From: Rutger V. <R....@re...> - 2011-04-14 12:24:56
|
I've been thinking about a redesign lately, and here's what I would do: - make sure we can export all the metadata from TreeBASE in (nexml) files, i.e. Laurel's GSoC project - get all the files for all the studies and create a simple folder structure with those files, more or less along the lines of the phylows urls (e.g phylows/tree/TB2/nexml/T1312), in all file formats we can think of (json, nexus, phylip, newick, phyloxml, fasta, static html, etc...) - create some mod_rewrite rules to map our urls onto the folder structure (so going from phylows/tree/TB2:T1312?format=nexml to the actual file). Performance will obviously be much, much better. - also allow harvesting of those files using rsync and ftp. Now we have data dumps. - create a simple plug in architecture where, given a query string, a remote web service returns a list of hits which it somehow generates from its local, harvested data dump Here's a use case: Laurel wants to implement BLASTing into TreeBASE. So she periodically harvests everything in phylows/matrix/TB2/fasta with an rsync cron job. She runs formatdb on the fasta sequences, creating a standalone BLAST. She then writes a simple cgi script that accepts a target string (e.g. myblast.cgi?tb2.blast=acgctcgcatcgcatcgactacgac) and returns a list of phylows matrix urls from the matching results. On the TreeBASE side, we simply add a search widget that delegates the query to Laurel's service and integrates it into our interfaces (graphical, web services). With an architecture like that it would be so much easier for anyone to add functionality in whatever programming language they like without having to deal with a massive database schema. Of course any of those remote services might have its own little database, but it'd be more along the lines of a three-table SQLite database to store the ITIS taxonomy structure such that we can find all TreeBASE taxa subtended by a given ITIS higher taxon ID (for example). We'd have to implement some core plug ins ourselves, notably one that extracts the metadata (using RDFA2RDFXML) and sticks that in a triple store so we can search on, say, author names, journals, etc. I think that's scalable because it's only a few dozen triples for each study. The idea is a little bit inspired by DAS, which seems to work quite well: http://www.biodas.org/wiki/Main_Page (I'm leaving out the submission part as an exercise for the reader. Presumably there would have to be a restricted area where properly formatted files are uploaded and made available to reviewers.) Rutger p.s. I've done some harvesting a few weeks back, but unless that's created queries that are still running I'm innocent. On Thu, Apr 14, 2011 at 12:28 PM, Roderic Page <r....@bi...> wrote: > TreeBASE performance has nothing to do with me folks, I pretty much gave up trying to download data from it a few weeks back. Someone really, really needs to rethink the way TreeBASE works, because it's virtually unusable. > > Regards > > Rod > > On 14 Apr 2011, at 04:20, William Piel wrote: > >> >> On Apr 13, 2011, at 9:36 PM, Hilmar Lapp wrote: >> >>> The trees in S11267 seem to silently fail to render in PhyloWidget. >>> All I get is a single dot. At least try the first three: >>> >>> http://www.treebase.org/treebase-web/search/study/trees.html?id=11267 >>> >>> Is this a temporary glitch, a problem with the reconstructed file, or >>> something else? Should I best file this as a bug in the bug tracker? >>> >>> -hilmar >>> >> >> Thanks for the alert. This is actually a bug in PhyloWidget -- if the word "tree" appears in the title of the tree block, PhyloWidget confuses this word with the TREE command, and so goofs up the parsing. >> >> I have removed the word "tree" from the tree block, so it works now. >> >> However, you might need to refresh the PhyloWidget window after loading to get the tree to pull through -- TreeBASE feels quite slow lately; our API must be getting hit by someone (Rod?). >> >> bp >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > --------------------------------------------------------- > Roderic Page > Professor of Taxonomy > Institute of Biodiversity, Animal Health and Comparative Medicine > College of Medical, Veterinary and Life Sciences > Graham Kerr Building > University of Glasgow > Glasgow G12 8QQ, UK > > Email: r....@bi... > Tel: +44 141 330 4778 > Fax: +44 141 330 2792 > AIM: rod...@ai... > Facebook: http://www.facebook.com/profile.php?id=1112517192 > Twitter: http://twitter.com/rdmpage > Blog: http://iphylo.blogspot.com > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > > > > > > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2011-04-14 13:26:03
|
Whoops... forgot to reply to <TreeBASE Devel>. At any rate, I just noticed that someone form the Czech Republic has uploaded >20 files, each with > 60,000 sequences, this might be causing java heap size issues. bp On Apr 14, 2011, at 8:39 AM, William Piel wrote: > Hi Rod, > > Some performance measures should have improved since your earlier efforts with the API. Morphological datasets, for example, should download much faster -- in part seeing as the database has shrunk from ~200GB to 3GB in size. > > But of course you're right -- the API needs to be much more efficient, e.g. refactored so that the CQL translates more directly into HQL instead of just hitching in to our web SearchController. > > Regarding the user interface usability -- as you know, this is an unfunded open source project -- so you're very welcome to make improvements ! > > bp > > > On Apr 14, 2011, at 7:28 AM, Roderic Page wrote: > >> TreeBASE performance has nothing to do with me folks, I pretty much gave up trying to download data from it a few weeks back. Someone really, really needs to rethink the way TreeBASE works, because it's virtually unusable. >> >> Regards >> >> Rod >> >> On 14 Apr 2011, at 04:20, William Piel wrote: >> >>> >>> On Apr 13, 2011, at 9:36 PM, Hilmar Lapp wrote: >>> >>>> The trees in S11267 seem to silently fail to render in PhyloWidget. >>>> All I get is a single dot. At least try the first three: >>>> >>>> http://www.treebase.org/treebase-web/search/study/trees.html?id=11267 >>>> >>>> Is this a temporary glitch, a problem with the reconstructed file, or >>>> something else? Should I best file this as a bug in the bug tracker? >>>> >>>> -hilmar >>>> >>> >>> Thanks for the alert. This is actually a bug in PhyloWidget -- if the word "tree" appears in the title of the tree block, PhyloWidget confuses this word with the TREE command, and so goofs up the parsing. >>> >>> I have removed the word "tree" from the tree block, so it works now. >>> >>> However, you might need to refresh the PhyloWidget window after loading to get the tree to pull through -- TreeBASE feels quite slow lately; our API must be getting hit by someone (Rod?). >>> >>> bp >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Benefiting from Server Virtualization: Beyond Initial Workload >>> Consolidation -- Increasing the use of server virtualization is a top >>> priority.Virtualization can reduce costs, simplify management, and improve >>> application availability and disaster protection. Learn more about boosting >>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>> _______________________________________________ >>> Treebase-devel mailing list >>> Tre...@li... >>> https://lists.sourceforge.net/lists/listinfo/treebase-devel > |
From: Richard R. <rr...@fi...> - 2011-04-14 14:01:47
|
Hi folks, Just joined the list, as I am interested in using TB to develop collaborative methods for synthesizing plant phylogeny, e.g., by grafting clades together and other kinds of agglomeration. So, I am particularly interested in the API and harvesting, especially with respect to taxonomic names and classifications, and GenBank identifiers of sequences. Speaking of which, how would I go about harvesting the GenBank numbers for a given study and associating them with their alignments? Its that possible with the current API? I like Rutger's plug-in ideas. In general, it's easier for someone like me to provide a simple web service, rather than contribute directly to TB development. -Rick On Thu, Apr 14, 2011 at 7:24 AM, Rutger Vos <R....@re...> wrote: > I've been thinking about a redesign lately, and here's what I would do: > > - make sure we can export all the metadata from TreeBASE in (nexml) > files, i.e. Laurel's GSoC project > > - get all the files for all the studies and create a simple folder > structure with those files, more or less along the lines of the > phylows urls (e.g phylows/tree/TB2/nexml/T1312), in all file formats > we can think of (json, nexus, phylip, newick, phyloxml, fasta, static > html, etc...) > > - create some mod_rewrite rules to map our urls onto the folder > structure (so going from phylows/tree/TB2:T1312?format=nexml to the > actual file). Performance will obviously be much, much better. > > - also allow harvesting of those files using rsync and ftp. Now we > have data dumps. > > - create a simple plug in architecture where, given a query string, a > remote web service returns a list of hits which it somehow generates > from its local, harvested data dump > > Here's a use case: Laurel wants to implement BLASTing into TreeBASE. > So she periodically harvests everything in phylows/matrix/TB2/fasta > with an rsync cron job. She runs formatdb on the fasta sequences, > creating a standalone BLAST. She then writes a simple cgi script that > accepts a target string (e.g. > myblast.cgi?tb2.blast=acgctcgcatcgcatcgactacgac) and returns a list of > phylows matrix urls from the matching results. > > On the TreeBASE side, we simply add a search widget that delegates the > query to Laurel's service and integrates it into our interfaces > (graphical, web services). > > With an architecture like that it would be so much easier for anyone > to add functionality in whatever programming language they like > without having to deal with a massive database schema. Of course any > of those remote services might have its own little database, but it'd > be more along the lines of a three-table SQLite database to store the > ITIS taxonomy structure such that we can find all TreeBASE taxa > subtended by a given ITIS higher taxon ID (for example). > > We'd have to implement some core plug ins ourselves, notably one that > extracts the metadata (using RDFA2RDFXML) and sticks that in a triple > store so we can search on, say, author names, journals, etc. I think > that's scalable because it's only a few dozen triples for each study. > > The idea is a little bit inspired by DAS, which seems to work quite > well: http://www.biodas.org/wiki/Main_Page > > (I'm leaving out the submission part as an exercise for the reader. > Presumably there would have to be a restricted area where properly > formatted files are uploaded and made available to reviewers.) > > Rutger > > p.s. I've done some harvesting a few weeks back, but unless that's > created queries that are still running I'm innocent. > > On Thu, Apr 14, 2011 at 12:28 PM, Roderic Page <r....@bi...> > wrote: > > TreeBASE performance has nothing to do with me folks, I pretty much gave > up trying to download data from it a few weeks back. Someone really, really > needs to rethink the way TreeBASE works, because it's virtually unusable. > > > > Regards > > > > Rod > > > > On 14 Apr 2011, at 04:20, William Piel wrote: > > > >> > >> On Apr 13, 2011, at 9:36 PM, Hilmar Lapp wrote: > >> > >>> The trees in S11267 seem to silently fail to render in PhyloWidget. > >>> All I get is a single dot. At least try the first three: > >>> > >>> http://www.treebase.org/treebase-web/search/study/trees.html?id=11267 > >>> > >>> Is this a temporary glitch, a problem with the reconstructed file, or > >>> something else? Should I best file this as a bug in the bug tracker? > >>> > >>> -hilmar > >>> > >> > >> Thanks for the alert. This is actually a bug in PhyloWidget -- if the > word "tree" appears in the title of the tree block, PhyloWidget confuses > this word with the TREE command, and so goofs up the parsing. > >> > >> I have removed the word "tree" from the tree block, so it works now. > >> > >> However, you might need to refresh the PhyloWidget window after loading > to get the tree to pull through -- TreeBASE feels quite slow lately; our API > must be getting hit by someone (Rod?). > >> > >> bp > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Benefiting from Server Virtualization: Beyond Initial Workload > >> Consolidation -- Increasing the use of server virtualization is a top > >> priority.Virtualization can reduce costs, simplify management, and > improve > >> application availability and disaster protection. Learn more about > boosting > >> the value of server virtualization. > http://p.sf.net/sfu/vmware-sfdev2dev > >> _______________________________________________ > >> Treebase-devel mailing list > >> Tre...@li... > >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > >> > > > > --------------------------------------------------------- > > Roderic Page > > Professor of Taxonomy > > Institute of Biodiversity, Animal Health and Comparative Medicine > > College of Medical, Veterinary and Life Sciences > > Graham Kerr Building > > University of Glasgow > > Glasgow G12 8QQ, UK > > > > Email: r....@bi... > > Tel: +44 141 330 4778 > > Fax: +44 141 330 2792 > > AIM: rod...@ai... > > Facebook: http://www.facebook.com/profile.php?id=1112517192 > > Twitter: http://twitter.com/rdmpage > > Blog: http://iphylo.blogspot.com > > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Benefiting from Server Virtualization: Beyond Initial Workload > > Consolidation -- Increasing the use of server virtualization is a top > > priority.Virtualization can reduce costs, simplify management, and > improve > > application availability and disaster protection. Learn more about > boosting > > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > > _______________________________________________ > > Treebase-devel mailing list > > Tre...@li... > > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > |
From: Rutger V. <R....@re...> - 2011-04-14 14:22:38
|
Hi Rick, if you mean whether we currently write out accession numbers for sequences then no, that's currently not possible. You can do more with NCBI taxonomy identifiers, though, under the current API. Rutger On Thu, Apr 14, 2011 at 2:39 PM, Richard Ree <rr...@fi...> wrote: > Hi folks, > Just joined the list, as I am interested in using TB to develop > collaborative methods for synthesizing plant phylogeny, e.g., by grafting > clades together and other kinds of agglomeration. So, I am particularly > interested in the API and harvesting, especially with respect to taxonomic > names and classifications, and GenBank identifiers of sequences. > Speaking of which, how would I go about harvesting the GenBank numbers for a > given study and associating them with their alignments? Its that possible > with the current API? > I like Rutger's plug-in ideas. In general, it's easier for someone like me > to provide a simple web service, rather than contribute directly to TB > development. > -Rick > > On Thu, Apr 14, 2011 at 7:24 AM, Rutger Vos <R....@re...> wrote: >> >> I've been thinking about a redesign lately, and here's what I would do: >> >> - make sure we can export all the metadata from TreeBASE in (nexml) >> files, i.e. Laurel's GSoC project >> >> - get all the files for all the studies and create a simple folder >> structure with those files, more or less along the lines of the >> phylows urls (e.g phylows/tree/TB2/nexml/T1312), in all file formats >> we can think of (json, nexus, phylip, newick, phyloxml, fasta, static >> html, etc...) >> >> - create some mod_rewrite rules to map our urls onto the folder >> structure (so going from phylows/tree/TB2:T1312?format=nexml to the >> actual file). Performance will obviously be much, much better. >> >> - also allow harvesting of those files using rsync and ftp. Now we >> have data dumps. >> >> - create a simple plug in architecture where, given a query string, a >> remote web service returns a list of hits which it somehow generates >> from its local, harvested data dump >> >> Here's a use case: Laurel wants to implement BLASTing into TreeBASE. >> So she periodically harvests everything in phylows/matrix/TB2/fasta >> with an rsync cron job. She runs formatdb on the fasta sequences, >> creating a standalone BLAST. She then writes a simple cgi script that >> accepts a target string (e.g. >> myblast.cgi?tb2.blast=acgctcgcatcgcatcgactacgac) and returns a list of >> phylows matrix urls from the matching results. >> >> On the TreeBASE side, we simply add a search widget that delegates the >> query to Laurel's service and integrates it into our interfaces >> (graphical, web services). >> >> With an architecture like that it would be so much easier for anyone >> to add functionality in whatever programming language they like >> without having to deal with a massive database schema. Of course any >> of those remote services might have its own little database, but it'd >> be more along the lines of a three-table SQLite database to store the >> ITIS taxonomy structure such that we can find all TreeBASE taxa >> subtended by a given ITIS higher taxon ID (for example). >> >> We'd have to implement some core plug ins ourselves, notably one that >> extracts the metadata (using RDFA2RDFXML) and sticks that in a triple >> store so we can search on, say, author names, journals, etc. I think >> that's scalable because it's only a few dozen triples for each study. >> >> The idea is a little bit inspired by DAS, which seems to work quite >> well: http://www.biodas.org/wiki/Main_Page >> >> (I'm leaving out the submission part as an exercise for the reader. >> Presumably there would have to be a restricted area where properly >> formatted files are uploaded and made available to reviewers.) >> >> Rutger >> >> p.s. I've done some harvesting a few weeks back, but unless that's >> created queries that are still running I'm innocent. >> >> On Thu, Apr 14, 2011 at 12:28 PM, Roderic Page <r....@bi...> >> wrote: >> > TreeBASE performance has nothing to do with me folks, I pretty much >> > gave up trying to download data from it a few weeks back. Someone really, >> > really needs to rethink the way TreeBASE works, because it's virtually >> > unusable. >> > >> > Regards >> > >> > Rod >> > >> > On 14 Apr 2011, at 04:20, William Piel wrote: >> > >> >> >> >> On Apr 13, 2011, at 9:36 PM, Hilmar Lapp wrote: >> >> >> >>> The trees in S11267 seem to silently fail to render in PhyloWidget. >> >>> All I get is a single dot. At least try the first three: >> >>> >> >>> http://www.treebase.org/treebase-web/search/study/trees.html?id=11267 >> >>> >> >>> Is this a temporary glitch, a problem with the reconstructed file, or >> >>> something else? Should I best file this as a bug in the bug tracker? >> >>> >> >>> -hilmar >> >>> >> >> >> >> Thanks for the alert. This is actually a bug in PhyloWidget -- if the >> >> word "tree" appears in the title of the tree block, PhyloWidget confuses >> >> this word with the TREE command, and so goofs up the parsing. >> >> >> >> I have removed the word "tree" from the tree block, so it works now. >> >> >> >> However, you might need to refresh the PhyloWidget window after loading >> >> to get the tree to pull through -- TreeBASE feels quite slow lately; our API >> >> must be getting hit by someone (Rod?). >> >> >> >> bp >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Benefiting from Server Virtualization: Beyond Initial Workload >> >> Consolidation -- Increasing the use of server virtualization is a top >> >> priority.Virtualization can reduce costs, simplify management, and >> >> improve >> >> application availability and disaster protection. Learn more about >> >> boosting >> >> the value of server virtualization. >> >> http://p.sf.net/sfu/vmware-sfdev2dev >> >> _______________________________________________ >> >> Treebase-devel mailing list >> >> Tre...@li... >> >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> >> >> > >> > --------------------------------------------------------- >> > Roderic Page >> > Professor of Taxonomy >> > Institute of Biodiversity, Animal Health and Comparative Medicine >> > College of Medical, Veterinary and Life Sciences >> > Graham Kerr Building >> > University of Glasgow >> > Glasgow G12 8QQ, UK >> > >> > Email: r....@bi... >> > Tel: +44 141 330 4778 >> > Fax: +44 141 330 2792 >> > AIM: rod...@ai... >> > Facebook: http://www.facebook.com/profile.php?id=1112517192 >> > Twitter: http://twitter.com/rdmpage >> > Blog: http://iphylo.blogspot.com >> > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > Benefiting from Server Virtualization: Beyond Initial Workload >> > Consolidation -- Increasing the use of server virtualization is a top >> > priority.Virtualization can reduce costs, simplify management, and >> > improve >> > application availability and disaster protection. Learn more about >> > boosting >> > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> > _______________________________________________ >> > Treebase-devel mailing list >> > Tre...@li... >> > https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > >> >> >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading >> RG6 6BX >> United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://www.nexml.org >> http://rutgervos.blogspot.com >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about >> boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: Hilmar L. <hl...@ne...> - 2011-04-14 14:31:53
|
Could we not, similar to what NCBI includes in their dumps, create simple downloadable mapping tables, for example one with (at least) two columns, one being Genbank accession, and the other being the TB matrix ID (and to make it more useful, columns for TB study, TB taxon, and citation could be added). If we create a script that extracts that out of the database once a week, that might be pretty useful to people. NCBI produces a variety of these, to map accession to taxon, gene ID, etc, and people use them a lot. -hilmar On Apr 14, 2011, at 9:39 AM, Richard Ree wrote: > Hi folks, > > Just joined the list, as I am interested in using TB to develop > collaborative methods for synthesizing plant phylogeny, e.g., by > grafting clades together and other kinds of agglomeration. So, I am > particularly interested in the API and harvesting, especially with > respect to taxonomic names and classifications, and GenBank > identifiers of sequences. > > Speaking of which, how would I go about harvesting the GenBank > numbers for a given study and associating them with their > alignments? Its that possible with the current API? > > I like Rutger's plug-in ideas. In general, it's easier for someone > like me to provide a simple web service, rather than contribute > directly to TB development. > > -Rick > > > On Thu, Apr 14, 2011 at 7:24 AM, Rutger Vos <R....@re...> > wrote: > I've been thinking about a redesign lately, and here's what I would > do: > > - make sure we can export all the metadata from TreeBASE in (nexml) > files, i.e. Laurel's GSoC project > > - get all the files for all the studies and create a simple folder > structure with those files, more or less along the lines of the > phylows urls (e.g phylows/tree/TB2/nexml/T1312), in all file formats > we can think of (json, nexus, phylip, newick, phyloxml, fasta, static > html, etc...) > > - create some mod_rewrite rules to map our urls onto the folder > structure (so going from phylows/tree/TB2:T1312?format=nexml to the > actual file). Performance will obviously be much, much better. > > - also allow harvesting of those files using rsync and ftp. Now we > have data dumps. > > - create a simple plug in architecture where, given a query string, a > remote web service returns a list of hits which it somehow generates > from its local, harvested data dump > > Here's a use case: Laurel wants to implement BLASTing into TreeBASE. > So she periodically harvests everything in phylows/matrix/TB2/fasta > with an rsync cron job. She runs formatdb on the fasta sequences, > creating a standalone BLAST. She then writes a simple cgi script that > accepts a target string (e.g. > myblast.cgi?tb2.blast=acgctcgcatcgcatcgactacgac) and returns a list of > phylows matrix urls from the matching results. > > On the TreeBASE side, we simply add a search widget that delegates the > query to Laurel's service and integrates it into our interfaces > (graphical, web services). > > With an architecture like that it would be so much easier for anyone > to add functionality in whatever programming language they like > without having to deal with a massive database schema. Of course any > of those remote services might have its own little database, but it'd > be more along the lines of a three-table SQLite database to store the > ITIS taxonomy structure such that we can find all TreeBASE taxa > subtended by a given ITIS higher taxon ID (for example). > > We'd have to implement some core plug ins ourselves, notably one that > extracts the metadata (using RDFA2RDFXML) and sticks that in a triple > store so we can search on, say, author names, journals, etc. I think > that's scalable because it's only a few dozen triples for each study. > > The idea is a little bit inspired by DAS, which seems to work quite > well: http://www.biodas.org/wiki/Main_Page > > (I'm leaving out the submission part as an exercise for the reader. > Presumably there would have to be a restricted area where properly > formatted files are uploaded and made available to reviewers.) > > Rutger > > p.s. I've done some harvesting a few weeks back, but unless that's > created queries that are still running I'm innocent. > > On Thu, Apr 14, 2011 at 12:28 PM, Roderic Page > <r....@bi...> wrote: > > TreeBASE performance has nothing to do with me folks, I pretty > much gave up trying to download data from it a few weeks back. > Someone really, really needs to rethink the way TreeBASE works, > because it's virtually unusable. > > > > Regards > > > > Rod > > > > On 14 Apr 2011, at 04:20, William Piel wrote: > > > >> > >> On Apr 13, 2011, at 9:36 PM, Hilmar Lapp wrote: > >> > >>> The trees in S11267 seem to silently fail to render in > PhyloWidget. > >>> All I get is a single dot. At least try the first three: > >>> > >>> http://www.treebase.org/treebase-web/search/study/trees.html?id=11267 > >>> > >>> Is this a temporary glitch, a problem with the reconstructed > file, or > >>> something else? Should I best file this as a bug in the bug > tracker? > >>> > >>> -hilmar > >>> > >> > >> Thanks for the alert. This is actually a bug in PhyloWidget -- if > the word "tree" appears in the title of the tree block, PhyloWidget > confuses this word with the TREE command, and so goofs up the parsing. > >> > >> I have removed the word "tree" from the tree block, so it works > now. > >> > >> However, you might need to refresh the PhyloWidget window after > loading to get the tree to pull through -- TreeBASE feels quite slow > lately; our API must be getting hit by someone (Rod?). > >> > >> bp > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Benefiting from Server Virtualization: Beyond Initial Workload > >> Consolidation -- Increasing the use of server virtualization is a > top > >> priority.Virtualization can reduce costs, simplify management, > and improve > >> application availability and disaster protection. Learn more > about boosting > >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > >> _______________________________________________ > >> Treebase-devel mailing list > >> Tre...@li... > >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > >> > > > > --------------------------------------------------------- > > Roderic Page > > Professor of Taxonomy > > Institute of Biodiversity, Animal Health and Comparative Medicine > > College of Medical, Veterinary and Life Sciences > > Graham Kerr Building > > University of Glasgow > > Glasgow G12 8QQ, UK > > > > Email: r....@bi... > > Tel: +44 141 330 4778 > > Fax: +44 141 330 2792 > > AIM: rod...@ai... > > Facebook: http://www.facebook.com/profile.php?id=1112517192 > > Twitter: http://twitter.com/rdmpage > > Blog: http://iphylo.blogspot.com > > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Benefiting from Server Virtualization: Beyond Initial Workload > > Consolidation -- Increasing the use of server virtualization is a > top > > priority.Virtualization can reduce costs, simplify management, and > improve > > application availability and disaster protection. Learn more about > boosting > > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > > _______________________________________________ > > Treebase-devel mailing list > > Tre...@li... > > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and > improve > application availability and disaster protection. Learn more about > boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and > improve > application availability and disaster protection. Learn more about > boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: William P. <wil...@ya...> - 2011-04-14 14:33:46
|
On Apr 14, 2011, at 9:39 AM, Richard Ree wrote: > Hi folks, > > Just joined the list, as I am interested in using TB to develop collaborative methods for synthesizing plant phylogeny, e.g., by grafting clades together and other kinds of agglomeration. So, I am particularly interested in the API and harvesting, especially with respect to taxonomic names and classifications, and GenBank identifiers of sequences. > > Speaking of which, how would I go about harvesting the GenBank numbers for a given study and associating them with their alignments? Its that possible with the current API? > > I like Rutger's plug-in ideas. In general, it's easier for someone like me to provide a simple web service, rather than contribute directly to TB development. > > -Rick I guess there are various options for this. For example, you could start by finding studies published by a guy named "Ree": http://purl.org/phylo/treebase/phylows/study/find?query=dcterms.contributor=Ree&format=rss1 Out of that list, pick the first item (S10145), and you could ask for a list of matrices: http://purl.org/phylo/treebase/phylows/study/find?query=tb.identifier.study=S10145&format=rss1&recordSchema=matrix And then if you pick one matrix (e.g. M4388), you could ask for the NeXML serialization of it: http://purl.org/phylo/treebase/phylows/matrix/TB2:M4388?format=nexml And in the OTU section, you'll find a mapping between "Ruta graveolens" and NCBI's taxid 37565: <otu about="#otu21609" id="otu21609" label="Ruta graveolens"> <meta href="http://purl.uniprot.org/taxonomy/37565" id="meta21613" rel="skos:closeMatch" xsi:type="nex:ResourceMeta"/> Alternatively, you could ask for a list of trees: http://purl.org/phylo/treebase/phylows/study/find?query=tb.identifier.study=S10145&format=rss1&recordSchema=tree And then serialize one of the trees: http://purl.org/phylo/treebase/phylows/tree/TB2:Tr6161?format=nexml .... with the same annotation in for Ruta graveolens. bp |
From: Roderic P. <r....@bi...> - 2011-04-14 16:04:51
|
So I guess I'd do the following: 1. Separate data entry from data access. SQL may have a place for data entry, but that's it. And MySQL is fine, really. 2. The data access end is a document database like CouchDB which stores metadata (and trees) as JSON 3. Simple query API that more or less wraps CouchDB queries, search by taxon, identifier, geography, or full text. 4. Store data on disk in original format, as well as derived formats as Rutger suggests. Being able to grab dumps in various formats is handy, especially if the data can be reliably obtained. 5. Have a web interface that's simple, easy to use, supports search without asking user whether something is a number or not, use SVG for trees, enable users to log in using Facebook/Twitter/Mendeley 6. Devolve as much editing as possible to other places, e.g. Mendeley for bibliographic stuff 7. Never, ever mention RDF. Bonus points for not mentioning XML. My sense as an outside observer is that much of the current iteration of TreeBASE has been driven by technology (Postgresql, Tomcat, RDF, Java, XML), not usability. I understand the rationale for the choices (I think), but at the end of the date TreeBASE should be about the trees. It's not about publications, it's not about sequences, it's not really about data (OK, a little bit about data), it's about trees. I should be able to find my trees, find trees from a paper, find trees for a taxon, find trees from a given part of the world, find trees that use a given sequence, find trees that look like my trees. Read Michael Wolfe's answer to the question "Why is Dropbox more popular than other programs with similar functionality?" and you'll see where I'm coming from http://www.quora.com/Dropbox/Why-is-Dropbox-more-popular-than-other-programs-with-similar-functionality Regards Rod --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |
From: Rutger V. <R....@re...> - 2011-04-14 16:31:38
|
I tried loading the JSON files I shared with you the other day in couchdb and it turned out I would have to recompile it with more memory or else it chokes. So the idea is that it's just the metadata that goes into a database, and you like that to be couchdb as opposed to a triple store, right? And you like the idea of distributed editing, so does that mean you also like the idea of distributed searching, along the plug-in idea? With the various projects you've done to annotate/correct/taxon map treebase data, it would be great if those could be plugged into a common, easy-to-use front end. On Thu, Apr 14, 2011 at 5:04 PM, Roderic Page <r....@bi...> wrote: > So I guess I'd do the following: > > 1. Separate data entry from data access. SQL may have a place for data entry, but that's it. And MySQL is fine, really. > > 2. The data access end is a document database like CouchDB which stores metadata (and trees) as JSON > > 3. Simple query API that more or less wraps CouchDB queries, search by taxon, identifier, geography, or full text. > > 4. Store data on disk in original format, as well as derived formats as Rutger suggests. Being able to grab dumps in various formats is handy, especially if the data can be reliably obtained. > > 5. Have a web interface that's simple, easy to use, supports search without asking user whether something is a number or not, use SVG for trees, enable users to log in using Facebook/Twitter/Mendeley > > 6. Devolve as much editing as possible to other places, e.g. Mendeley for bibliographic stuff > > 7. Never, ever mention RDF. Bonus points for not mentioning XML. > > My sense as an outside observer is that much of the current iteration of TreeBASE has been driven by technology (Postgresql, Tomcat, RDF, Java, XML), not usability. I understand the rationale for the choices (I think), but at the end of the date TreeBASE should be about the trees. It's not about publications, it's not about sequences, it's not really about data (OK, a little bit about data), it's about trees. I should be able to find my trees, find trees from a paper, find trees for a taxon, find trees from a given part of the world, find trees that use a given sequence, find trees that look like my trees. > > Read Michael Wolfe's answer to the question "Why is Dropbox more popular than other programs with similar functionality?" and you'll see where I'm coming from > > http://www.quora.com/Dropbox/Why-is-Dropbox-more-popular-than-other-programs-with-similar-functionality > > Regards > > Rod > > --------------------------------------------------------- > Roderic Page > Professor of Taxonomy > Institute of Biodiversity, Animal Health and Comparative Medicine > College of Medical, Veterinary and Life Sciences > Graham Kerr Building > University of Glasgow > Glasgow G12 8QQ, UK > > Email: r....@bi... > Tel: +44 141 330 4778 > Fax: +44 141 330 2792 > AIM: rod...@ai... > Facebook: http://www.facebook.com/profile.php?id=1112517192 > Twitter: http://twitter.com/rdmpage > Blog: http://iphylo.blogspot.com > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > > > > > > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: Roderic P. <r....@bi...> - 2011-04-14 17:05:41
|
Dear Rutger, Not sure why CouchDB choked, I have databases with millions of JSON documents and it works OK. That said, the JSON you sent was pretty ugly (namespaces, brrr!) It seems to me that much of what we do is pass around objects, and JSON (without namespaces) is a light-weight way to do this that we can also operate on very simply. I lots patience with triple stores, partly because there's the overhead of having RDF vocabularies, and the interesting queries are not supported (e.g., spatial queries). I get the idea behind RDF, and was once an enthusiast, but I think there are too many practical issues that make it less than useful. Distributed editing makes sense to me in that for most things there's always a bigger fish. Does anybody think they can do a better job of providing online management of bibliographic metadata than Zotero or Mendeley? If not, why bother recreating that, just enable people to use those tools and harvest the edits. So, I'd want to focus on the one thing TreeBASE has that nobody else has, namely the trees. Although, having said that, if Phylota had a decent interface it would be awesome. Regards Rod On 14 Apr 2011, at 17:31, Rutger Vos wrote: > I tried loading the JSON files I shared with you the other day in > couchdb and it turned out I would have to recompile it with more > memory or else it chokes. So the idea is that it's just the metadata > that goes into a database, and you like that to be couchdb as opposed > to a triple store, right? And you like the idea of distributed > editing, so does that mean you also like the idea of distributed > searching, along the plug-in idea? With the various projects you've > done to annotate/correct/taxon map treebase data, it would be great if > those could be plugged into a common, easy-to-use front end. > > On Thu, Apr 14, 2011 at 5:04 PM, Roderic Page <r....@bi...> wrote: >> So I guess I'd do the following: >> >> 1. Separate data entry from data access. SQL may have a place for data entry, but that's it. And MySQL is fine, really. >> >> 2. The data access end is a document database like CouchDB which stores metadata (and trees) as JSON >> >> 3. Simple query API that more or less wraps CouchDB queries, search by taxon, identifier, geography, or full text. >> >> 4. Store data on disk in original format, as well as derived formats as Rutger suggests. Being able to grab dumps in various formats is handy, especially if the data can be reliably obtained. >> >> 5. Have a web interface that's simple, easy to use, supports search without asking user whether something is a number or not, use SVG for trees, enable users to log in using Facebook/Twitter/Mendeley >> >> 6. Devolve as much editing as possible to other places, e.g. Mendeley for bibliographic stuff >> >> 7. Never, ever mention RDF. Bonus points for not mentioning XML. >> >> My sense as an outside observer is that much of the current iteration of TreeBASE has been driven by technology (Postgresql, Tomcat, RDF, Java, XML), not usability. I understand the rationale for the choices (I think), but at the end of the date TreeBASE should be about the trees. It's not about publications, it's not about sequences, it's not really about data (OK, a little bit about data), it's about trees. I should be able to find my trees, find trees from a paper, find trees for a taxon, find trees from a given part of the world, find trees that use a given sequence, find trees that look like my trees. >> >> Read Michael Wolfe's answer to the question "Why is Dropbox more popular than other programs with similar functionality?" and you'll see where I'm coming from >> >> http://www.quora.com/Dropbox/Why-is-Dropbox-more-popular-than-other-programs-with-similar-functionality >> >> Regards >> >> Rod >> >> --------------------------------------------------------- >> Roderic Page >> Professor of Taxonomy >> Institute of Biodiversity, Animal Health and Comparative Medicine >> College of Medical, Veterinary and Life Sciences >> Graham Kerr Building >> University of Glasgow >> Glasgow G12 8QQ, UK >> >> Email: r....@bi... >> Tel: +44 141 330 4778 >> Fax: +44 141 330 2792 >> AIM: rod...@ai... >> Facebook: http://www.facebook.com/profile.php?id=1112517192 >> Twitter: http://twitter.com/rdmpage >> Blog: http://iphylo.blogspot.com >> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |
From: Rutger V. <R....@re...> - 2011-04-14 17:57:07
|
> Distributed editing makes sense to me in that for most things there's always a bigger fish. Does anybody think they can do a better job of providing online management of bibliographic metadata than Zotero or Mendeley? If not, why bother recreating that, just enable people to use those tools and harvest the edits. Yes, I think that is key. Focus on the core business and delegate everything else. > On 14 Apr 2011, at 17:31, Rutger Vos wrote: > >> I tried loading the JSON files I shared with you the other day in >> couchdb and it turned out I would have to recompile it with more >> memory or else it chokes. So the idea is that it's just the metadata >> that goes into a database, and you like that to be couchdb as opposed >> to a triple store, right? And you like the idea of distributed >> editing, so does that mean you also like the idea of distributed >> searching, along the plug-in idea? With the various projects you've >> done to annotate/correct/taxon map treebase data, it would be great if >> those could be plugged into a common, easy-to-use front end. >> >> On Thu, Apr 14, 2011 at 5:04 PM, Roderic Page <r....@bi...> wrote: >>> So I guess I'd do the following: >>> >>> 1. Separate data entry from data access. SQL may have a place for data entry, but that's it. And MySQL is fine, really. >>> >>> 2. The data access end is a document database like CouchDB which stores metadata (and trees) as JSON >>> >>> 3. Simple query API that more or less wraps CouchDB queries, search by taxon, identifier, geography, or full text. >>> >>> 4. Store data on disk in original format, as well as derived formats as Rutger suggests. Being able to grab dumps in various formats is handy, especially if the data can be reliably obtained. >>> >>> 5. Have a web interface that's simple, easy to use, supports search without asking user whether something is a number or not, use SVG for trees, enable users to log in using Facebook/Twitter/Mendeley >>> >>> 6. Devolve as much editing as possible to other places, e.g. Mendeley for bibliographic stuff >>> >>> 7. Never, ever mention RDF. Bonus points for not mentioning XML. >>> >>> My sense as an outside observer is that much of the current iteration of TreeBASE has been driven by technology (Postgresql, Tomcat, RDF, Java, XML), not usability. I understand the rationale for the choices (I think), but at the end of the date TreeBASE should be about the trees. It's not about publications, it's not about sequences, it's not really about data (OK, a little bit about data), it's about trees. I should be able to find my trees, find trees from a paper, find trees for a taxon, find trees from a given part of the world, find trees that use a given sequence, find trees that look like my trees. >>> >>> Read Michael Wolfe's answer to the question "Why is Dropbox more popular than other programs with similar functionality?" and you'll see where I'm coming from >>> >>> http://www.quora.com/Dropbox/Why-is-Dropbox-more-popular-than-other-programs-with-similar-functionality >>> >>> Regards >>> >>> Rod >>> >>> --------------------------------------------------------- >>> Roderic Page >>> Professor of Taxonomy >>> Institute of Biodiversity, Animal Health and Comparative Medicine >>> College of Medical, Veterinary and Life Sciences >>> Graham Kerr Building >>> University of Glasgow >>> Glasgow G12 8QQ, UK >>> >>> Email: r....@bi... >>> Tel: +44 141 330 4778 >>> Fax: +44 141 330 2792 >>> AIM: rod...@ai... >>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>> Twitter: http://twitter.com/rdmpage >>> Blog: http://iphylo.blogspot.com >>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Benefiting from Server Virtualization: Beyond Initial Workload >>> Consolidation -- Increasing the use of server virtualization is a top >>> priority.Virtualization can reduce costs, simplify management, and improve >>> application availability and disaster protection. Learn more about boosting >>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>> _______________________________________________ >>> Treebase-devel mailing list >>> Tre...@li... >>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>> >> >> >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading >> RG6 6BX >> United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://www.nexml.org >> http://rutgervos.blogspot.com >> > > --------------------------------------------------------- > Roderic Page > Professor of Taxonomy > Institute of Biodiversity, Animal Health and Comparative Medicine > College of Medical, Veterinary and Life Sciences > Graham Kerr Building > University of Glasgow > Glasgow G12 8QQ, UK > > Email: r....@bi... > Tel: +44 141 330 4778 > Fax: +44 141 330 2792 > AIM: rod...@ai... > Facebook: http://www.facebook.com/profile.php?id=1112517192 > Twitter: http://twitter.com/rdmpage > Blog: http://iphylo.blogspot.com > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > > > > > > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2011-04-14 20:00:21
|
On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: > It's not about publications, it's not about sequences, it's not really about data (OK, a little bit about data), it's about trees From where I sit, alignments are an important resource for the community. Nobody emails me asked for a tree that is missing from TreeBASE, but I'm always being asked for an alignment that should be in TreeBASE but is not.Typically it is because the author started, but never finished, a submission. Earlier this year I had a case where an author wanted her alignments embargoed for a year post-publication. After several people independently emailed me to request access, I contacted the journal, they convened the board, and they passed a resolution stating that all data must be released immediately. So these are not without value. Alignments are collections of hypotheses of homology (NCHAR of them per alignment!) that are often difficult to rebuild from scratch -- trees are merely blended summary diagrams of these hypotheses. Plus, retyping a morphological dataset after OCR'ing a PDF is an enormous pain. On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: > So I guess I'd do the following: Is there anything to stop anyone from doing exactly this? And you could have your CouchDB updated periodically by doing a cron on TreeBASE's OAI-PMH to get the IDs of all new or modified studies (e.g., since April 12th, GMT: http://treebase.org/treebase-web/top/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2011-04-12T00:00:00Z), and then pull down the NeXML for just the trees, convert to JSON, populate the CouchDB, etc. It should be relatively easy to maintain a CouchDB mirror. bp PS - Hmm... Rod, do you know you do have a way of loving-and-then-hating things? :-) On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: > 7. Never, ever mention RDF. On Apr 14, 2011, at 1:05 PM, Roderic Page wrote: > I ... was once an enthusiast [of RDF] On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: > Bonus points for not mentioning XML. On May 20, 2004, at 7:37 PM, Roderic D. M. Page wrote: > [I think TreeBASE should] store data (say, the character states for a taxon) as an XML formatted BLOB. On Jan 15, 2006, at 4:15 PM, Roderic Page wrote: > once one of the major providers adopts LSIDs (my money is on uBio), whatever they adopt will drive standards On Apr 1, 2009, at 9:20 AM, Roderic D. M. Page wrote: > I think that [LSIDs have] been the Achilles heel of biodiversity informatics. [etc..] |
From: Vladimir G. <vga...@ne...> - 2011-04-14 21:13:44
|
On Apr 14, 2011, at 4:00 PM, William Piel wrote: > PS - Hmm... Rod, do you know you do have a way of loving-and-then- > hating things? :-) > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> 7. Never, ever mention RDF. > > On Apr 14, 2011, at 1:05 PM, Roderic Page wrote: >> I ... was once an enthusiast [of RDF] > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> Bonus points for not mentioning XML. > > > On May 20, 2004, at 7:37 PM, Roderic D. M. Page wrote: >> [I think TreeBASE should] store data (say, the character states for >> a taxon) as an XML formatted BLOB. > > On Jan 15, 2006, at 4:15 PM, Roderic Page wrote: >> once one of the major providers adopts LSIDs (my money is on uBio), >> whatever they adopt will drive standards > > On Apr 1, 2009, at 9:20 AM, Roderic D. M. Page wrote: >> I think that [LSIDs have] been the Achilles heel of biodiversity >> informatics. > > [etc..] > Evolution? Someone will build a tree from this! -V |
From: Rutger V. <R....@re...> - 2011-04-14 22:27:00
|
Ok, that was priceless. > PS - Hmm... Rod, do you know you do have a way of loving-and-then-hating > things? :-) > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: > > 7. Never, ever mention RDF. > > On Apr 14, 2011, at 1:05 PM, Roderic Page wrote: > > I ... was once an enthusiast [of RDF] > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: > > Bonus points for not mentioning XML. > > On May 20, 2004, at 7:37 PM, Roderic D. M. Page wrote: > > [I think TreeBASE should] store data (say, the character states for a taxon) > as an XML formatted BLOB. > > On Jan 15, 2006, at 4:15 PM, Roderic Page wrote: > > once one of the major providers adopts LSIDs (my money is on uBio), whatever > they adopt will drive standards > > On Apr 1, 2009, at 9:20 AM, Roderic D. M. Page wrote: > > I think that [LSIDs have] been the Achilles heel of > biodiversity informatics. > > [etc..] > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: Roderic P. <r....@bi...> - 2011-04-15 08:37:33
|
Touché! On 14 Apr 2011, at 23:26, Rutger Vos wrote: > Ok, that was priceless. > >> PS - Hmm... Rod, do you know you do have a way of loving-and-then-hating >> things? :-) >> On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> >> 7. Never, ever mention RDF. >> >> On Apr 14, 2011, at 1:05 PM, Roderic Page wrote: >> >> I ... was once an enthusiast [of RDF] >> >> On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> >> Bonus points for not mentioning XML. >> >> On May 20, 2004, at 7:37 PM, Roderic D. M. Page wrote: >> >> [I think TreeBASE should] store data (say, the character states for a taxon) >> as an XML formatted BLOB. >> >> On Jan 15, 2006, at 4:15 PM, Roderic Page wrote: >> >> once one of the major providers adopts LSIDs (my money is on uBio), whatever >> they adopt will drive standards >> >> On Apr 1, 2009, at 9:20 AM, Roderic D. M. Page wrote: >> >> I think that [LSIDs have] been the Achilles heel of >> biodiversity informatics. >> >> [etc..] >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |
From: Roderic P. <r....@bi...> - 2011-04-15 08:50:00
|
Dear Bill, Yes, I was being a little glib, and I'm all for the data being available. But I wonder whether, given the rate at which new sequence data is being acquired, many people will redo analyses using more data, rather than reanalyse a older data set. My comment about trees is that, at the end of the day, it's the one thing that makes TreeBASE unique. From my no doubt biased perspective, it could/should be the place I go to find out "what do we know about the phylogeny of group x?" Why not just make a CouchDB database right now? I tried, believe me I tried, but a big chunk of TreeBASE didn't make it down the wire. For large studies the Nexml generation simply times out, so I gave up. I guess this is what Rutger's suggestion of having file dumps would address. If every study had a Nexml file sitting on the server then I could just fetch those, rather than hammer the database and get frustrated when it times out. Given that I have the attention span of a gnat, if something doesn't work I tend to drop it for a while and go on to something else. If I could reliably get all TreeBASE studies in Nexml, I'd make the CouchDB version in a flash. Regards Rod On 14 Apr 2011, at 21:00, William Piel wrote: > > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> It's not about publications, it's not about sequences, it's not really about data (OK, a little bit about data), it's about trees > > From where I sit, alignments are an important resource for the community. Nobody emails me asked for a tree that is missing from TreeBASE, but I'm always being asked for an alignment that should be in TreeBASE but is not.Typically it is because the author started, but never finished, a submission. Earlier this year I had a case where an author wanted her alignments embargoed for a year post-publication. After several people independently emailed me to request access, I contacted the journal, they convened the board, and they passed a resolution stating that all data must be released immediately. So these are not without value. Alignments are collections of hypotheses of homology (NCHAR of them per alignment!) that are often difficult to rebuild from scratch -- trees are merely blended summary diagrams of these hypotheses. Plus, retyping a morphological dataset after OCR'ing a PDF is an enormous pain. > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> So I guess I'd do the following: > > Is there anything to stop anyone from doing exactly this? > > And you could have your CouchDB updated periodically by doing a cron on TreeBASE's OAI-PMH to get the IDs of all new or modified studies (e.g., since April 12th, GMT: http://treebase.org/treebase-web/top/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2011-04-12T00:00:00Z), and then pull down the NeXML for just the trees, convert to JSON, populate the CouchDB, etc. It should be relatively easy to maintain a CouchDB mirror. > > bp > > > PS - Hmm... Rod, do you know you do have a way of loving-and-then-hating things? :-) > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> 7. Never, ever mention RDF. > > On Apr 14, 2011, at 1:05 PM, Roderic Page wrote: >> I ... was once an enthusiast [of RDF] > > On Apr 14, 2011, at 12:04 PM, Roderic Page wrote: >> Bonus points for not mentioning XML. > > On May 20, 2004, at 7:37 PM, Roderic D. M. Page wrote: >> [I think TreeBASE should] store data (say, the character states for a taxon) as an XML formatted BLOB. > > On Jan 15, 2006, at 4:15 PM, Roderic Page wrote: >> once one of the major providers adopts LSIDs (my money is on uBio), whatever they adopt will drive standards > > On Apr 1, 2009, at 9:20 AM, Roderic D. M. Page wrote: >> I think that [LSIDs have] been the Achilles heel of biodiversity informatics. > > [etc..] > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |
From: William P. <wil...@ya...> - 2011-04-15 12:43:01
|
On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: > For large studies the Nexml generation simply times out, so I gave up. If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. But, indeed, I'd like access to a dump too. bp |
From: Roderic P. <r....@bi...> - 2011-04-18 12:04:10
|
I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. Need I say more...? Regards Rod On 15 Apr 2011, at 13:42, William Piel wrote: > > On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: > >> For large studies the Nexml generation simply times out, so I gave up. > > If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. > > But, indeed, I'd like access to a dump too. > > bp > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |
From: Rutger V. <R....@re...> - 2011-04-18 12:40:28
|
Yeah, I know, some of the studies are serialized incorrectly, especially the ones with "mixed" data containing both DNA and categorical data in the same matrix, or unusual state definitions in some other way. This results in a character state set definition being written out for every matrix column, and that takes up most of the file. Another thing is that we're now using owl:sameAs statements to specify the TreeBASE ID for every character. There are a number of these issues, they're bugs, I'm recording them - it's one of the things we should be fixing during Laurel's project. A correctly formatted NeXML file is going to be bigger than the equivalent NEXUS file, but perhaps like a factor of ten or so max, depending on the amount of metadata (i.e. on the order of 1Mb for S2012). That is a trade-off that is worth it because it will allow us to export all the metadata in a single file. 13.7 Mb is obviously wrong. On Mon, Apr 18, 2011 at 1:03 PM, Roderic Page <r....@bi...> wrote: > I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. > > Need I say more...? > > Regards > > Rod > > On 15 Apr 2011, at 13:42, William Piel wrote: > >> >> On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: >> >>> For large studies the Nexml generation simply times out, so I gave up. >> >> If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. >> >> But, indeed, I'd like access to a dump too. >> >> bp >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > --------------------------------------------------------- > Roderic Page > Professor of Taxonomy > Institute of Biodiversity, Animal Health and Comparative Medicine > College of Medical, Veterinary and Life Sciences > Graham Kerr Building > University of Glasgow > Glasgow G12 8QQ, UK > > Email: r....@bi... > Tel: +44 141 330 4778 > Fax: +44 141 330 2792 > AIM: rod...@ai... > Facebook: http://www.facebook.com/profile.php?id=1112517192 > Twitter: http://twitter.com/rdmpage > Blog: http://iphylo.blogspot.com > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > > > > > > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |
From: Rutger V. <R....@re...> - 2011-04-18 12:47:30
|
To give an example of how things should be: I've also done a NeXML dump and split all harvested studies in their constituent trees, matrices and taxa blocks. The largest NeXML tree file (with taxa block) in TreeBASE is 365Kb for a for a 585 taxon tree. To me that seems a reasonable size. The bulk of a matrix file for that set of taxa should be <seq> elements with raw character state sequences, preceded by a taxa block and an nchar list of <char> elements. You can imagine that that's not going to be 13.7 Mb once things are working correctly. On Mon, Apr 18, 2011 at 1:40 PM, Rutger Vos <R....@re...> wrote: > Yeah, I know, some of the studies are serialized incorrectly, > especially the ones with "mixed" data containing both DNA and > categorical data in the same matrix, or unusual state definitions in > some other way. This results in a character state set definition being > written out for every matrix column, and that takes up most of the > file. Another thing is that we're now using owl:sameAs statements to > specify the TreeBASE ID for every character. > > There are a number of these issues, they're bugs, I'm recording them - > it's one of the things we should be fixing during Laurel's project. A > correctly formatted NeXML file is going to be bigger than the > equivalent NEXUS file, but perhaps like a factor of ten or so max, > depending on the amount of metadata (i.e. on the order of 1Mb for > S2012). That is a trade-off that is worth it because it will allow us > to export all the metadata in a single file. 13.7 Mb is obviously > wrong. > > On Mon, Apr 18, 2011 at 1:03 PM, Roderic Page <r....@bi...> wrote: >> I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. >> >> Need I say more...? >> >> Regards >> >> Rod >> >> On 15 Apr 2011, at 13:42, William Piel wrote: >> >>> >>> On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: >>> >>>> For large studies the Nexml generation simply times out, so I gave up. >>> >>> If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. >>> >>> But, indeed, I'd like access to a dump too. >>> >>> bp >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Benefiting from Server Virtualization: Beyond Initial Workload >>> Consolidation -- Increasing the use of server virtualization is a top >>> priority.Virtualization can reduce costs, simplify management, and improve >>> application availability and disaster protection. Learn more about boosting >>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>> _______________________________________________ >>> Treebase-devel mailing list >>> Tre...@li... >>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>> >> >> --------------------------------------------------------- >> Roderic Page >> Professor of Taxonomy >> Institute of Biodiversity, Animal Health and Comparative Medicine >> College of Medical, Veterinary and Life Sciences >> Graham Kerr Building >> University of Glasgow >> Glasgow G12 8QQ, UK >> >> Email: r....@bi... >> Tel: +44 141 330 4778 >> Fax: +44 141 330 2792 >> AIM: rod...@ai... >> Facebook: http://www.facebook.com/profile.php?id=1112517192 >> Twitter: http://twitter.com/rdmpage >> Blog: http://iphylo.blogspot.com >> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading, RG6 6BX, United Kingdom > Tel: +44 (0) 118 378 7535 > http://rutgervos.blogspot.com > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |
From: Roderic P. <r....@bi...> - 2011-04-19 10:26:00
|
Below is the list of TreeBASE studies that have failed to output Nexml when I've tried to harvest them with a timeout of 10 minutes. Any way to get hold of these? Rod S131 S132 S134 S202 S613 S1085 S1158 S1183 S1197 S1302 S1303 S1306 S1307 S1308 S1309 S1310 S1311 S1312 S1313 S1314 S1315 S1316 S1317 S1318 S1319 S1320 S1321 S1322 S1326 S1330 S1936 S2039 S2078 S2372 S2373 S2376 S2377 S9993 S9997 S9998 S9999 S10071 S10287 S10316 S10335 S10433 S10507 S10508 S10511 S10541 S10603 S10613 S10635 S10665 S10689 S10736 S10888 S10917 S10940 S11032 S11080 On 18 Apr 2011, at 13:47, Rutger Vos wrote: > To give an example of how things should be: I've also done a NeXML > dump and split all harvested studies in their constituent trees, > matrices and taxa blocks. The largest NeXML tree file (with taxa > block) in TreeBASE is 365Kb for a for a 585 taxon tree. To me that > seems a reasonable size. The bulk of a matrix file for that set of > taxa should be <seq> elements with raw character state sequences, > preceded by a taxa block and an nchar list of <char> elements. You can > imagine that that's not going to be 13.7 Mb once things are working > correctly. > > On Mon, Apr 18, 2011 at 1:40 PM, Rutger Vos <R....@re...> wrote: >> Yeah, I know, some of the studies are serialized incorrectly, >> especially the ones with "mixed" data containing both DNA and >> categorical data in the same matrix, or unusual state definitions in >> some other way. This results in a character state set definition being >> written out for every matrix column, and that takes up most of the >> file. Another thing is that we're now using owl:sameAs statements to >> specify the TreeBASE ID for every character. >> >> There are a number of these issues, they're bugs, I'm recording them - >> it's one of the things we should be fixing during Laurel's project. A >> correctly formatted NeXML file is going to be bigger than the >> equivalent NEXUS file, but perhaps like a factor of ten or so max, >> depending on the amount of metadata (i.e. on the order of 1Mb for >> S2012). That is a trade-off that is worth it because it will allow us >> to export all the metadata in a single file. 13.7 Mb is obviously >> wrong. >> >> On Mon, Apr 18, 2011 at 1:03 PM, Roderic Page <r....@bi...> wrote: >>> I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. >>> >>> Need I say more...? >>> >>> Regards >>> >>> Rod >>> >>> On 15 Apr 2011, at 13:42, William Piel wrote: >>> >>>> >>>> On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: >>>> >>>>> For large studies the Nexml generation simply times out, so I gave up. >>>> >>>> If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. >>>> >>>> But, indeed, I'd like access to a dump too. >>>> >>>> bp >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>> Consolidation -- Increasing the use of server virtualization is a top >>>> priority.Virtualization can reduce costs, simplify management, and improve >>>> application availability and disaster protection. Learn more about boosting >>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>> _______________________________________________ >>>> Treebase-devel mailing list >>>> Tre...@li... >>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>> >>> >>> --------------------------------------------------------- >>> Roderic Page >>> Professor of Taxonomy >>> Institute of Biodiversity, Animal Health and Comparative Medicine >>> College of Medical, Veterinary and Life Sciences >>> Graham Kerr Building >>> University of Glasgow >>> Glasgow G12 8QQ, UK >>> >>> Email: r....@bi... >>> Tel: +44 141 330 4778 >>> Fax: +44 141 330 2792 >>> AIM: rod...@ai... >>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>> Twitter: http://twitter.com/rdmpage >>> Blog: http://iphylo.blogspot.com >>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Benefiting from Server Virtualization: Beyond Initial Workload >>> Consolidation -- Increasing the use of server virtualization is a top >>> priority.Virtualization can reduce costs, simplify management, and improve >>> application availability and disaster protection. Learn more about boosting >>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>> _______________________________________________ >>> Treebase-devel mailing list >>> Tre...@li... >>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>> >> >> >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading, RG6 6BX, United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://rutgervos.blogspot.com >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading, RG6 6BX, United Kingdom > Tel: +44 (0) 118 378 7535 > http://rutgervos.blogspot.com > --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |
From: Rutger V. <R....@re...> - 2011-04-19 10:45:07
|
If you've tried to harvest them in a batch, long-running queries from aborted downloads will accumulate so you get more failures later on. I've downloaded several of these, so one thing you can do is simply try again. On Tue, Apr 19, 2011 at 11:25 AM, Roderic Page <r....@bi...> wrote: > Below is the list of TreeBASE studies that have failed to output Nexml when I've tried to harvest them with a timeout of 10 minutes. Any way to get hold of these? > > Rod > > S131 > S132 > S134 > S202 > S613 > S1085 > S1158 > S1183 > S1197 > S1302 > S1303 > S1306 > S1307 > S1308 > S1309 > S1310 > S1311 > S1312 > S1313 > S1314 > S1315 > S1316 > S1317 > S1318 > S1319 > S1320 > S1321 > S1322 > S1326 > S1330 > S1936 > S2039 > S2078 > S2372 > S2373 > S2376 > S2377 > S9993 > S9997 > S9998 > S9999 > S10071 > S10287 > S10316 > S10335 > S10433 > S10507 > S10508 > S10511 > S10541 > S10603 > S10613 > S10635 > S10665 > S10689 > S10736 > S10888 > S10917 > S10940 > S11032 > S11080 > > > On 18 Apr 2011, at 13:47, Rutger Vos wrote: > >> To give an example of how things should be: I've also done a NeXML >> dump and split all harvested studies in their constituent trees, >> matrices and taxa blocks. The largest NeXML tree file (with taxa >> block) in TreeBASE is 365Kb for a for a 585 taxon tree. To me that >> seems a reasonable size. The bulk of a matrix file for that set of >> taxa should be <seq> elements with raw character state sequences, >> preceded by a taxa block and an nchar list of <char> elements. You can >> imagine that that's not going to be 13.7 Mb once things are working >> correctly. >> >> On Mon, Apr 18, 2011 at 1:40 PM, Rutger Vos <R....@re...> wrote: >>> Yeah, I know, some of the studies are serialized incorrectly, >>> especially the ones with "mixed" data containing both DNA and >>> categorical data in the same matrix, or unusual state definitions in >>> some other way. This results in a character state set definition being >>> written out for every matrix column, and that takes up most of the >>> file. Another thing is that we're now using owl:sameAs statements to >>> specify the TreeBASE ID for every character. >>> >>> There are a number of these issues, they're bugs, I'm recording them - >>> it's one of the things we should be fixing during Laurel's project. A >>> correctly formatted NeXML file is going to be bigger than the >>> equivalent NEXUS file, but perhaps like a factor of ten or so max, >>> depending on the amount of metadata (i.e. on the order of 1Mb for >>> S2012). That is a trade-off that is worth it because it will allow us >>> to export all the metadata in a single file. 13.7 Mb is obviously >>> wrong. >>> >>> On Mon, Apr 18, 2011 at 1:03 PM, Roderic Page <r....@bi...> wrote: >>>> I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. >>>> >>>> Need I say more...? >>>> >>>> Regards >>>> >>>> Rod >>>> >>>> On 15 Apr 2011, at 13:42, William Piel wrote: >>>> >>>>> >>>>> On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: >>>>> >>>>>> For large studies the Nexml generation simply times out, so I gave up. >>>>> >>>>> If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. >>>>> >>>>> But, indeed, I'd like access to a dump too. >>>>> >>>>> bp >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>>> Consolidation -- Increasing the use of server virtualization is a top >>>>> priority.Virtualization can reduce costs, simplify management, and improve >>>>> application availability and disaster protection. Learn more about boosting >>>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>>> _______________________________________________ >>>>> Treebase-devel mailing list >>>>> Tre...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>>> >>>> >>>> --------------------------------------------------------- >>>> Roderic Page >>>> Professor of Taxonomy >>>> Institute of Biodiversity, Animal Health and Comparative Medicine >>>> College of Medical, Veterinary and Life Sciences >>>> Graham Kerr Building >>>> University of Glasgow >>>> Glasgow G12 8QQ, UK >>>> >>>> Email: r....@bi... >>>> Tel: +44 141 330 4778 >>>> Fax: +44 141 330 2792 >>>> AIM: rod...@ai... >>>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>>> Twitter: http://twitter.com/rdmpage >>>> Blog: http://iphylo.blogspot.com >>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>> Consolidation -- Increasing the use of server virtualization is a top >>>> priority.Virtualization can reduce costs, simplify management, and improve >>>> application availability and disaster protection. Learn more about boosting >>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>> _______________________________________________ >>>> Treebase-devel mailing list >>>> Tre...@li... >>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>> >>> >>> >>> >>> -- >>> Dr. Rutger A. Vos >>> School of Biological Sciences >>> Philip Lyle Building, Level 4 >>> University of Reading >>> Reading, RG6 6BX, United Kingdom >>> Tel: +44 (0) 118 378 7535 >>> http://rutgervos.blogspot.com >>> >> >> >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading, RG6 6BX, United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://rutgervos.blogspot.com >> > > --------------------------------------------------------- > Roderic Page > Professor of Taxonomy > Institute of Biodiversity, Animal Health and Comparative Medicine > College of Medical, Veterinary and Life Sciences > Graham Kerr Building > University of Glasgow > Glasgow G12 8QQ, UK > > Email: r....@bi... > Tel: +44 141 330 4778 > Fax: +44 141 330 2792 > AIM: rod...@ai... > Facebook: http://www.facebook.com/profile.php?id=1112517192 > Twitter: http://twitter.com/rdmpage > Blog: http://iphylo.blogspot.com > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > > > > > > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |