From: Roderic P. <r....@bi...> - 2011-04-19 10:26:00
|
Below is the list of TreeBASE studies that have failed to output Nexml when I've tried to harvest them with a timeout of 10 minutes. Any way to get hold of these? Rod S131 S132 S134 S202 S613 S1085 S1158 S1183 S1197 S1302 S1303 S1306 S1307 S1308 S1309 S1310 S1311 S1312 S1313 S1314 S1315 S1316 S1317 S1318 S1319 S1320 S1321 S1322 S1326 S1330 S1936 S2039 S2078 S2372 S2373 S2376 S2377 S9993 S9997 S9998 S9999 S10071 S10287 S10316 S10335 S10433 S10507 S10508 S10511 S10541 S10603 S10613 S10635 S10665 S10689 S10736 S10888 S10917 S10940 S11032 S11080 On 18 Apr 2011, at 13:47, Rutger Vos wrote: > To give an example of how things should be: I've also done a NeXML > dump and split all harvested studies in their constituent trees, > matrices and taxa blocks. The largest NeXML tree file (with taxa > block) in TreeBASE is 365Kb for a for a 585 taxon tree. To me that > seems a reasonable size. The bulk of a matrix file for that set of > taxa should be <seq> elements with raw character state sequences, > preceded by a taxa block and an nchar list of <char> elements. You can > imagine that that's not going to be 13.7 Mb once things are working > correctly. > > On Mon, Apr 18, 2011 at 1:40 PM, Rutger Vos <R....@re...> wrote: >> Yeah, I know, some of the studies are serialized incorrectly, >> especially the ones with "mixed" data containing both DNA and >> categorical data in the same matrix, or unusual state definitions in >> some other way. This results in a character state set definition being >> written out for every matrix column, and that takes up most of the >> file. Another thing is that we're now using owl:sameAs statements to >> specify the TreeBASE ID for every character. >> >> There are a number of these issues, they're bugs, I'm recording them - >> it's one of the things we should be fixing during Laurel's project. A >> correctly formatted NeXML file is going to be bigger than the >> equivalent NEXUS file, but perhaps like a factor of ten or so max, >> depending on the amount of metadata (i.e. on the order of 1Mb for >> S2012). That is a trade-off that is worth it because it will allow us >> to export all the metadata in a single file. 13.7 Mb is obviously >> wrong. >> >> On Mon, Apr 18, 2011 at 1:03 PM, Roderic Page <r....@bi...> wrote: >>> I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. >>> >>> Need I say more...? >>> >>> Regards >>> >>> Rod >>> >>> On 15 Apr 2011, at 13:42, William Piel wrote: >>> >>>> >>>> On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: >>>> >>>>> For large studies the Nexml generation simply times out, so I gave up. >>>> >>>> If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. >>>> >>>> But, indeed, I'd like access to a dump too. >>>> >>>> bp >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>> Consolidation -- Increasing the use of server virtualization is a top >>>> priority.Virtualization can reduce costs, simplify management, and improve >>>> application availability and disaster protection. Learn more about boosting >>>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>>> _______________________________________________ >>>> Treebase-devel mailing list >>>> Tre...@li... >>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>>> >>> >>> --------------------------------------------------------- >>> Roderic Page >>> Professor of Taxonomy >>> Institute of Biodiversity, Animal Health and Comparative Medicine >>> College of Medical, Veterinary and Life Sciences >>> Graham Kerr Building >>> University of Glasgow >>> Glasgow G12 8QQ, UK >>> >>> Email: r....@bi... >>> Tel: +44 141 330 4778 >>> Fax: +44 141 330 2792 >>> AIM: rod...@ai... >>> Facebook: http://www.facebook.com/profile.php?id=1112517192 >>> Twitter: http://twitter.com/rdmpage >>> Blog: http://iphylo.blogspot.com >>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Benefiting from Server Virtualization: Beyond Initial Workload >>> Consolidation -- Increasing the use of server virtualization is a top >>> priority.Virtualization can reduce costs, simplify management, and improve >>> application availability and disaster protection. Learn more about boosting >>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>> _______________________________________________ >>> Treebase-devel mailing list >>> Tre...@li... >>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>> >> >> >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading, RG6 6BX, United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://rutgervos.blogspot.com >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading, RG6 6BX, United Kingdom > Tel: +44 (0) 118 378 7535 > http://rutgervos.blogspot.com > --------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r....@bi... Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rod...@ai... Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html |