From: Jon A. <jon...@ne...> - 2011-04-19 14:06:38
|
Treebase is back up. It had nothing to do with Rutger's commits. Treebase seems to get hammered more in the wee hours of the morning... -Jon On Apr 19, 2011, at 7:31 AM, Rutger Vos wrote: > Mmmm, don't know what's up with that. Presumably that will be dealt > with once North Carolina wakes up. > > Anyway, I committed some code that makes the produced NeXML somewhat > more concise and quicker to generate. This doesn't yet fix the issue > with the mixed data types in the same matrix, though. So the very, > very large files are still very, very large. Maybe we can see this > code on the dev server at some point? > > On Tue, Apr 19, 2011 at 12:21 PM, Roderic Page <r....@bi...> wrote: >> Well, now TreeBASE has crashed... >> >> >> On 19 Apr 2011, at 11:45, Rutger Vos wrote: >> >>> If you've tried to harvest them in a batch, long-running queries from >>> aborted downloads will accumulate so you get more failures later on. >>> I've downloaded several of these, so one thing you can do is simply >>> try again. >>> >>> On Tue, Apr 19, 2011 at 11:25 AM, Roderic Page <r....@bi...> wrote: >>>> Below is the list of TreeBASE studies that have failed to output Nexml when I've tried to harvest them with a timeout of 10 minutes. Any way to get hold of these? >>>> >>>> Rod >>>> >>>> S131 >>>> S132 >>>> S134 >>>> S202 >>>> S613 >>>> S1085 >>>> S1158 >>>> S1183 >>>> S1197 >>>> S1302 >>>> S1303 >>>> S1306 >>>> S1307 >>>> S1308 >>>> S1309 >>>> S1310 >>>> S1311 >>>> S1312 >>>> S1313 >>>> S1314 >>>> S1315 >>>> S1316 >>>> S1317 >>>> S1318 >>>> S1319 >>>> S1320 >>>> S1321 >>>> S1322 >>>> S1326 >>>> S1330 >>>> S1936 >>>> S2039 >>>> S2078 >>>> S2372 >>>> S2373 >>>> S2376 >>>> S2377 >>>> S9993 >>>> S9997 >>>> S9998 >>>> S9999 >>>> S10071 >>>> S10287 >>>> S10316 >>>> S10335 >>>> S10433 >>>> S10507 >>>> S10508 >>>> S10511 >>>> S10541 >>>> S10603 >>>> S10613 >>>> S10635 >>>> S10665 >>>> S10689 >>>> S10736 >>>> S10888 >>>> S10917 >>>> S10940 >>>> S11032 >>>> S11080 >>>> >>>> >>>> On 18 Apr 2011, at 13:47, Rutger Vos wrote: >>>> >>>>> To give an example of how things should be: I've also done a NeXML >>>>> dump and split all harvested studies in their constituent trees, >>>>> matrices and taxa blocks. The largest NeXML tree file (with taxa >>>>> block) in TreeBASE is 365Kb for a for a 585 taxon tree. To me that >>>>> seems a reasonable size. The bulk of a matrix file for that set of >>>>> taxa should be <seq> elements with raw character state sequences, >>>>> preceded by a taxa block and an nchar list of <char> elements. You can >>>>> imagine that that's not going to be 13.7 Mb once things are working >>>>> correctly. >>>>> >>>>> On Mon, Apr 18, 2011 at 1:40 PM, Rutger Vos <R....@re...> wrote: >>>>>> Yeah, I know, some of the studies are serialized incorrectly, >>>>>> especially the ones with "mixed" data containing both DNA and >>>>>> categorical data in the same matrix, or unusual state definitions in >>>>>> some other way. This results in a character state set definition being >>>>>> written out for every matrix column, and that takes up most of the >>>>>> file. Another thing is that we're now using owl:sameAs statements to >>>>>> specify the TreeBASE ID for every character. >>>>>> >>>>>> There are a number of these issues, they're bugs, I'm recording them - >>>>>> it's one of the things we should be fixing during Laurel's project. A >>>>>> correctly formatted NeXML file is going to be bigger than the >>>>>> equivalent NEXUS file, but perhaps like a factor of ten or so max, >>>>>> depending on the amount of metadata (i.e. on the order of 1Mb for >>>>>> S2012). That is a trade-off that is worth it because it will allow us >>>>>> to export all the metadata in a single file. 13.7 Mb is obviously >>>>>> wrong. >>>>>> >>>>>> On Mon, Apr 18, 2011 at 1:03 PM, Roderic Page <r....@bi...> wrote: >>>>>>> I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. >>>>>>> >>>>>>> Need I say more...? >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> Rod >>>>>>> >>>>>>> On 15 Apr 2011, at 13:42, William Piel wrote: >>>>>>> >>>>>>>> >>>>>>>> On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: >>>>>>>> >>>>>>>>> For large studies the Nexml generation simply times out, so I gave up. >>>>>>>> >>>>>>>> If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. >>>>>>>> >>>>>>>> But, indeed, I'd like access to a dump too. >>>>>>>> >>>>>>>> bp >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>>>>>> Consolidation -- Increasing the use of server virtualization is a top >>>>>>>> priority.Virtualization can reduce costs, simplify management, and improve >>>>>>>> application availability and disaster protection. Learn more about boosting >>>>>>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>>>>>> _______________________________________________ >>>>>>>> Treebase-devel mailing list >>>>>>>> Tre...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------- >>>>>>> Roderic Page >>>>>>> Professor of Taxonomy >>>>>>> Institute of Biodiversity, Animal Health and Comparative Medicine >>>>>>> College of Medical, Veterinary and Life Sciences >>>>>>> Graham Kerr Building >>>>>>> University of Glasgow >>>>>>> Glasgow G12 8QQ, UK >>>>>>> >>>>>>> Email: r....@bi... >>>>>>> Tel: +44 141 330 4778 >>>>>>> Fax: +44 141 330 2792 >>>>>>> AIM: rod...@ai... >>>>>>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>>>>>> Twitter: http://twitter.com/rdmpage >>>>>>> Blog: http://iphylo.blogspot.com >>>>>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>>>>> Consolidation -- Increasing the use of server virtualization is a top >>>>>>> priority.Virtualization can reduce costs, simplify management, and improve >>>>>>> application availability and disaster protection. Learn more about boosting >>>>>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>>>>> _______________________________________________ >>>>>>> Treebase-devel mailing list >>>>>>> Tre...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Dr. Rutger A. Vos >>>>>> School of Biological Sciences >>>>>> Philip Lyle Building, Level 4 >>>>>> University of Reading >>>>>> Reading, RG6 6BX, United Kingdom >>>>>> Tel: +44 (0) 118 378 7535 >>>>>> http://rutgervos.blogspot.com >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Dr. Rutger A. Vos >>>>> School of Biological Sciences >>>>> Philip Lyle Building, Level 4 >>>>> University of Reading >>>>> Reading, RG6 6BX, United Kingdom >>>>> Tel: +44 (0) 118 378 7535 >>>>> http://rutgervos.blogspot.com >>>>> >>>> >>>> --------------------------------------------------------- >>>> Roderic Page >>>> Professor of Taxonomy >>>> Institute of Biodiversity, Animal Health and Comparative Medicine >>>> College of Medical, Veterinary and Life Sciences >>>> Graham Kerr Building >>>> University of Glasgow >>>> Glasgow G12 8QQ, UK >>>> >>>> Email: r....@bi... >>>> Tel: +44 141 330 4778 >>>> Fax: +44 141 330 2792 >>>> AIM: rod...@ai... >>>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>>> Twitter: http://twitter.com/rdmpage >>>> Blog: http://iphylo.blogspot.com >>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>> Consolidation -- Increasing the use of server virtualization is a top >>>> priority.Virtualization can reduce costs, simplify management, and improve >>>> application availability and disaster protection. Learn more about boosting >>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>> _______________________________________________ >>>> Treebase-devel mailing list >>>> Tre...@li... >>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>> >>> >>> >>> >>> -- >>> Dr. Rutger A. Vos >>> School of Biological Sciences >>> Philip Lyle Building, Level 4 >>> University of Reading >>> Reading, RG6 6BX, United Kingdom >>> Tel: +44 (0) 118 378 7535 >>> http://rutgervos.blogspot.com >>> >> >> --------------------------------------------------------- >> Roderic Page >> Professor of Taxonomy >> Institute of Biodiversity, Animal Health and Comparative Medicine >> College of Medical, Veterinary and Life Sciences >> Graham Kerr Building >> University of Glasgow >> Glasgow G12 8QQ, UK >> >> Email: r....@bi... >> Tel: +44 141 330 4778 >> Fax: +44 141 330 2792 >> AIM: rod...@ai... >> Facebook: http://www.facebook.com/profile.php?id=1112517192 >> Twitter: http://twitter.com/rdmpage >> Blog: http://iphylo.blogspot.com >> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading, RG6 6BX, United Kingdom > Tel: +44 (0) 118 378 7535 > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ |