From: Rutger V. <R....@re...> - 2011-04-19 14:12:03
|
On Tue, Apr 19, 2011 at 3:06 PM, Jon Auman <jon...@ne...> wrote: > Treebase is back up. It had nothing to do with Rutger's commits. Treebase seems to get hammered more in the wee hours of the morning... Or, harvest o'clock in GMT. > On Apr 19, 2011, at 7:31 AM, Rutger Vos wrote: > >> Mmmm, don't know what's up with that. Presumably that will be dealt >> with once North Carolina wakes up. >> >> Anyway, I committed some code that makes the produced NeXML somewhat >> more concise and quicker to generate. This doesn't yet fix the issue >> with the mixed data types in the same matrix, though. So the very, >> very large files are still very, very large. Maybe we can see this >> code on the dev server at some point? >> >> On Tue, Apr 19, 2011 at 12:21 PM, Roderic Page <r....@bi...> wrote: >>> Well, now TreeBASE has crashed... >>> >>> >>> On 19 Apr 2011, at 11:45, Rutger Vos wrote: >>> >>>> If you've tried to harvest them in a batch, long-running queries from >>>> aborted downloads will accumulate so you get more failures later on. >>>> I've downloaded several of these, so one thing you can do is simply >>>> try again. >>>> >>>> On Tue, Apr 19, 2011 at 11:25 AM, Roderic Page <r....@bi...> wrote: >>>>> Below is the list of TreeBASE studies that have failed to output Nexml when I've tried to harvest them with a timeout of 10 minutes. Any way to get hold of these? >>>>> >>>>> Rod >>>>> >>>>> S131 >>>>> S132 >>>>> S134 >>>>> S202 >>>>> S613 >>>>> S1085 >>>>> S1158 >>>>> S1183 >>>>> S1197 >>>>> S1302 >>>>> S1303 >>>>> S1306 >>>>> S1307 >>>>> S1308 >>>>> S1309 >>>>> S1310 >>>>> S1311 >>>>> S1312 >>>>> S1313 >>>>> S1314 >>>>> S1315 >>>>> S1316 >>>>> S1317 >>>>> S1318 >>>>> S1319 >>>>> S1320 >>>>> S1321 >>>>> S1322 >>>>> S1326 >>>>> S1330 >>>>> S1936 >>>>> S2039 >>>>> S2078 >>>>> S2372 >>>>> S2373 >>>>> S2376 >>>>> S2377 >>>>> S9993 >>>>> S9997 >>>>> S9998 >>>>> S9999 >>>>> S10071 >>>>> S10287 >>>>> S10316 >>>>> S10335 >>>>> S10433 >>>>> S10507 >>>>> S10508 >>>>> S10511 >>>>> S10541 >>>>> S10603 >>>>> S10613 >>>>> S10635 >>>>> S10665 >>>>> S10689 >>>>> S10736 >>>>> S10888 >>>>> S10917 >>>>> S10940 >>>>> S11032 >>>>> S11080 >>>>> >>>>> >>>>> On 18 Apr 2011, at 13:47, Rutger Vos wrote: >>>>> >>>>>> To give an example of how things should be: I've also done a NeXML >>>>>> dump and split all harvested studies in their constituent trees, >>>>>> matrices and taxa blocks. The largest NeXML tree file (with taxa >>>>>> block) in TreeBASE is 365Kb for a for a 585 taxon tree. To me that >>>>>> seems a reasonable size. The bulk of a matrix file for that set of >>>>>> taxa should be <seq> elements with raw character state sequences, >>>>>> preceded by a taxa block and an nchar list of <char> elements. You can >>>>>> imagine that that's not going to be 13.7 Mb once things are working >>>>>> correctly. >>>>>> >>>>>> On Mon, Apr 18, 2011 at 1:40 PM, Rutger Vos <R....@re...> wrote: >>>>>>> Yeah, I know, some of the studies are serialized incorrectly, >>>>>>> especially the ones with "mixed" data containing both DNA and >>>>>>> categorical data in the same matrix, or unusual state definitions in >>>>>>> some other way. This results in a character state set definition being >>>>>>> written out for every matrix column, and that takes up most of the >>>>>>> file. Another thing is that we're now using owl:sameAs statements to >>>>>>> specify the TreeBASE ID for every character. >>>>>>> >>>>>>> There are a number of these issues, they're bugs, I'm recording them - >>>>>>> it's one of the things we should be fixing during Laurel's project. A >>>>>>> correctly formatted NeXML file is going to be bigger than the >>>>>>> equivalent NEXUS file, but perhaps like a factor of ten or so max, >>>>>>> depending on the amount of metadata (i.e. on the order of 1Mb for >>>>>>> S2012). That is a trade-off that is worth it because it will allow us >>>>>>> to export all the metadata in a single file. 13.7 Mb is obviously >>>>>>> wrong. >>>>>>> >>>>>>> On Mon, Apr 18, 2011 at 1:03 PM, Roderic Page <r....@bi...> wrote: >>>>>>>> I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. >>>>>>>> >>>>>>>> Need I say more...? >>>>>>>> >>>>>>>> Regards >>>>>>>> >>>>>>>> Rod >>>>>>>> >>>>>>>> On 15 Apr 2011, at 13:42, William Piel wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: >>>>>>>>> >>>>>>>>>> For large studies the Nexml generation simply times out, so I gave up. >>>>>>>>> >>>>>>>>> If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. >>>>>>>>> >>>>>>>>> But, indeed, I'd like access to a dump too. >>>>>>>>> >>>>>>>>> bp >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>>>>>>> Consolidation -- Increasing the use of server virtualization is a top >>>>>>>>> priority.Virtualization can reduce costs, simplify management, and improve >>>>>>>>> application availability and disaster protection. Learn more about boosting >>>>>>>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>>>>>>> _______________________________________________ >>>>>>>>> Treebase-devel mailing list >>>>>>>>> Tre...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------- >>>>>>>> Roderic Page >>>>>>>> Professor of Taxonomy >>>>>>>> Institute of Biodiversity, Animal Health and Comparative Medicine >>>>>>>> College of Medical, Veterinary and Life Sciences >>>>>>>> Graham Kerr Building >>>>>>>> University of Glasgow >>>>>>>> Glasgow G12 8QQ, UK >>>>>>>> >>>>>>>> Email: r....@bi... >>>>>>>> Tel: +44 141 330 4778 >>>>>>>> Fax: +44 141 330 2792 >>>>>>>> AIM: rod...@ai... >>>>>>>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>>>>>>> Twitter: http://twitter.com/rdmpage >>>>>>>> Blog: http://iphylo.blogspot.com >>>>>>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>>>>>> Consolidation -- Increasing the use of server virtualization is a top >>>>>>>> priority.Virtualization can reduce costs, simplify management, and improve >>>>>>>> application availability and disaster protection. Learn more about boosting >>>>>>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>>>>>> _______________________________________________ >>>>>>>> Treebase-devel mailing list >>>>>>>> Tre...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Dr. Rutger A. Vos >>>>>>> School of Biological Sciences >>>>>>> Philip Lyle Building, Level 4 >>>>>>> University of Reading >>>>>>> Reading, RG6 6BX, United Kingdom >>>>>>> Tel: +44 (0) 118 378 7535 >>>>>>> http://rutgervos.blogspot.com >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Dr. Rutger A. Vos >>>>>> School of Biological Sciences >>>>>> Philip Lyle Building, Level 4 >>>>>> University of Reading >>>>>> Reading, RG6 6BX, United Kingdom >>>>>> Tel: +44 (0) 118 378 7535 >>>>>> http://rutgervos.blogspot.com >>>>>> >>>>> >>>>> --------------------------------------------------------- >>>>> Roderic Page >>>>> Professor of Taxonomy >>>>> Institute of Biodiversity, Animal Health and Comparative Medicine >>>>> College of Medical, Veterinary and Life Sciences >>>>> Graham Kerr Building >>>>> University of Glasgow >>>>> Glasgow G12 8QQ, UK >>>>> >>>>> Email: r....@bi... >>>>> Tel: +44 141 330 4778 >>>>> Fax: +44 141 330 2792 >>>>> AIM: rod...@ai... >>>>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>>>> Twitter: http://twitter.com/rdmpage >>>>> Blog: http://iphylo.blogspot.com >>>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>>> Consolidation -- Increasing the use of server virtualization is a top >>>>> priority.Virtualization can reduce costs, simplify management, and improve >>>>> application availability and disaster protection. Learn more about boosting >>>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>>> _______________________________________________ >>>>> Treebase-devel mailing list >>>>> Tre...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>>> >>>> >>>> >>>> >>>> -- >>>> Dr. Rutger A. Vos >>>> School of Biological Sciences >>>> Philip Lyle Building, Level 4 >>>> University of Reading >>>> Reading, RG6 6BX, United Kingdom >>>> Tel: +44 (0) 118 378 7535 >>>> http://rutgervos.blogspot.com >>>> >>> >>> --------------------------------------------------------- >>> Roderic Page >>> Professor of Taxonomy >>> Institute of Biodiversity, Animal Health and Comparative Medicine >>> College of Medical, Veterinary and Life Sciences >>> Graham Kerr Building >>> University of Glasgow >>> Glasgow G12 8QQ, UK >>> >>> Email: r....@bi... >>> Tel: +44 141 330 4778 >>> Fax: +44 141 330 2792 >>> AIM: rod...@ai... >>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>> Twitter: http://twitter.com/rdmpage >>> Blog: http://iphylo.blogspot.com >>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Benefiting from Server Virtualization: Beyond Initial Workload >>> Consolidation -- Increasing the use of server virtualization is a top >>> priority.Virtualization can reduce costs, simplify management, and improve >>> application availability and disaster protection. Learn more about boosting >>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>> _______________________________________________ >>> Treebase-devel mailing list >>> Tre...@li... >>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>> >> >> >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading, RG6 6BX, United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://rutgervos.blogspot.com >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > > ------------------------------------------------------- > Jon Auman > Systems Administrator > National Evolutionary Synthesis Center > Duke University > http:www.nescent.org > jon...@ne... > ------------------------------------------------------ > > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |