From: Roderic P. <r....@bi...> - 2011-04-19 11:21:28
|
Well, now TreeBASE has crashed... On 19 Apr 2011, at 11:45, Rutger Vos wrote: > If you've tried to harvest them in a batch, long-running queries from > aborted downloads will accumulate so you get more failures later on. > I've downloaded several of these, so one thing you can do is simply > try again. > > On Tue, Apr 19, 2011 at 11:25 AM, Roderic Page <r....@bi...> wrote: >> Below is the list of TreeBASE studies that have failed to output Nexml when I've tried to harvest them with a timeout of 10 minutes. Any way to get hold of these? >> >> Rod >> >> S131 >> S132 >> S134 >> S202 >> S613 >> S1085 >> S1158 >> S1183 >> S1197 >> S1302 >> S1303 >> S1306 >> S1307 >> S1308 >> S1309 >> S1310 >> S1311 >> S1312 >> S1313 >> S1314 >> S1315 >> S1316 >> S1317 >> S1318 >> S1319 >> S1320 >> S1321 >> S1322 >> S1326 >> S1330 >> S1936 >> S2039 >> S2078 >> S2372 >> S2373 >> S2376 >> S2377 >> S9993 >> S9997 >> S9998 >> S9999 >> S10071 >> S10287 >> S10316 >> S10335 >> S10433 >> S10507 >> S10508 >> S10511 >> S10541 >> S10603 >> S10613 >> S10635 >> S10665 >> S10689 >> S10736 >> S10888 >> S10917 >> S10940 >> S11032 >> S11080 >> >> >> On 18 Apr 2011, at 13:47, Rutger Vos wrote: >> >>> To give an example of how things should be: I've also done a NeXML >>> dump and split all harvested studies in their constituent trees, >>> matrices and taxa blocks. The largest NeXML tree file (with taxa >>> block) in TreeBASE is 365Kb for a for a 585 taxon tree. To me that >>> seems a reasonable size. The bulk of a matrix file for that set of >>> taxa should be <seq> elements with raw character state sequences, >>> preceded by a taxa block and an nchar list of <char> elements. You can >>> imagine that that's not going to be 13.7 Mb once things are working >>> correctly. >>> >>> On Mon, Apr 18, 2011 at 1:40 PM, Rutger Vos <R....@re...> wrote: >>>> Yeah, I know, some of the studies are serialized incorrectly, >>>> especially the ones with "mixed" data containing both DNA and >>>> categorical data in the same matrix, or unusual state definitions in >>>> some other way. This results in a character state set definition being >>>> written out for every matrix column, and that takes up most of the >>>> file. Another thing is that we're now using owl:sameAs statements to >>>> specify the TreeBASE ID for every character. >>>> >>>> There are a number of these issues, they're bugs, I'm recording them - >>>> it's one of the things we should be fixing during Laurel's project. A >>>> correctly formatted NeXML file is going to be bigger than the >>>> equivalent NEXUS file, but perhaps like a factor of ten or so max, >>>> depending on the amount of metadata (i.e. on the order of 1Mb for >>>> S2012). That is a trade-off that is worth it because it will allow us >>>> to export all the metadata in a single file. 13.7 Mb is obviously >>>> wrong. >>>> >>>> On Mon, Apr 18, 2011 at 1:03 PM, Roderic Page <r....@bi...> wrote: >>>>> I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. >>>>> >>>>> Need I say more...? >>>>> >>>>> Regards >>>>> >>>>> Rod >>>>> >>>>> On 15 Apr 2011, at 13:42, William Piel wrote: >>>>> >>>>>> >>>>>> On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: >>>>>> >>>>>>> For large studies the Nexml generation simply times out, so I gave up. >>>>>> >>>>>> If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. >>>>>> >>>>>> But, indeed, I'd like access to a dump too. >>>>>> >>>>>> bp >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>>>> Consolidation -- Increasing the use of server virtualization is a top >>>>>> priority.Virtualization can reduce costs, simplify management, and improve >>>>>> application availability and disaster protection. Learn more about boosting >>>>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>>>> _______________________________________________ >>>>>> Treebase-devel mailing list >>>>>> Tre...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>>>> >>>>> >>>>> --------------------------------------------------------- >>>>> Roderic Page >>>>> Professor of Taxonomy >>>>> Institute of Biodiversity, Animal Health and Comparative Medicine >>>>> College of Medical, Veterinary and Life Sciences >>>>> Graham Kerr Building >>>>> University of Glasgow >>>>> Glasgow G12 8QQ, UK >>>>> >>>>> Email: r....@bi... >>>>> Tel: +44 141 330 4778 >>>>> Fax: +44 141 330 2792 >>>>> AIM: rod...@ai... >>>>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>>>> Twitter: http://twitter.com/rdmpage >>>>> Blog: http://iphylo.blogspot.com >>>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>>> Consolidation -- Increasing the use of server virtualization is a top >>>>> priority.Virtualization can reduce costs, simplify management, and improve >>>>> application availability and disaster protection. Learn more about boosting >>>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>>> _______________________________________________ >>>>> Treebase-devel mailing list >>>>> Tre...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>>> >>>> >>>> >>>> >>>> -- >>>> Dr. Rutger A. Vos >>>> School of Biological Sciences >>>> Philip Lyle Building, Level 4 >>>> University of Reading >>>> Reading, RG6 6BX, United Kingdom >>>> Tel: +44 (0) 118 378 7535 >>>> http://rutgervos.blogspot.com >>>> >>> >>> >>> >>> -- >>> Dr. Rutger A. Vos >>> School of Biological Sciences >>> Philip Lyle Building, Level 4 >>> University of Reading >>> Reading, RG6 6BX, United Kingdom >>> Tel: +44 (0) 118 378 7535 >>> http://rutgervos.blogspot.com >>> >> >> --------------------------------------------------------- >> Roderic Page >> Professor of Taxonomy >> Institute of Biodiversity, Animal Health and Comparative Medicine >> College of Medical, Veterinary and Life Sciences >> Graham Kerr Building >> University of Glasgow >> Glasgow G12 8QQ, UK >> >> Email: r....@bi... >> Tel: +44 141 330 4778 >> Fax: +44 141 330 2792 >> AIM: rod...@ai... >> Facebook: http://www.facebook.com/profile.php?id=1112517192 >> Twitter: http://twitter.com/rdmpage >> Blog: http://iphylo.blogspot.com >> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading, RG6 6BX, United Kingdom > Tel: +44 (0) 118 378 7535 > http://rutgervos.blogspot.com > --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |