From: Rutger V. <R....@re...> - 2011-04-18 12:47:30
|
To give an example of how things should be: I've also done a NeXML dump and split all harvested studies in their constituent trees, matrices and taxa blocks. The largest NeXML tree file (with taxa block) in TreeBASE is 365Kb for a for a 585 taxon tree. To me that seems a reasonable size. The bulk of a matrix file for that set of taxa should be <seq> elements with raw character state sequences, preceded by a taxa block and an nchar list of <char> elements. You can imagine that that's not going to be 13.7 Mb once things are working correctly. On Mon, Apr 18, 2011 at 1:40 PM, Rutger Vos <R....@re...> wrote: > Yeah, I know, some of the studies are serialized incorrectly, > especially the ones with "mixed" data containing both DNA and > categorical data in the same matrix, or unusual state definitions in > some other way. This results in a character state set definition being > written out for every matrix column, and that takes up most of the > file. Another thing is that we're now using owl:sameAs statements to > specify the TreeBASE ID for every character. > > There are a number of these issues, they're bugs, I'm recording them - > it's one of the things we should be fixing during Laurel's project. A > correctly formatted NeXML file is going to be bigger than the > equivalent NEXUS file, but perhaps like a factor of ten or so max, > depending on the amount of metadata (i.e. on the order of 1Mb for > S2012). That is a trade-off that is worth it because it will allow us > to export all the metadata in a single file. 13.7 Mb is obviously > wrong. > > On Mon, Apr 18, 2011 at 1:03 PM, Roderic Page <r....@bi...> wrote: >> I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. >> >> Need I say more...? >> >> Regards >> >> Rod >> >> On 15 Apr 2011, at 13:42, William Piel wrote: >> >>> >>> On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: >>> >>>> For large studies the Nexml generation simply times out, so I gave up. >>> >>> If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. >>> >>> But, indeed, I'd like access to a dump too. >>> >>> bp >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Benefiting from Server Virtualization: Beyond Initial Workload >>> Consolidation -- Increasing the use of server virtualization is a top >>> priority.Virtualization can reduce costs, simplify management, and improve >>> application availability and disaster protection. Learn more about boosting >>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>> _______________________________________________ >>> Treebase-devel mailing list >>> Tre...@li... >>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>> >> >> --------------------------------------------------------- >> Roderic Page >> Professor of Taxonomy >> Institute of Biodiversity, Animal Health and Comparative Medicine >> College of Medical, Veterinary and Life Sciences >> Graham Kerr Building >> University of Glasgow >> Glasgow G12 8QQ, UK >> >> Email: r....@bi... >> Tel: +44 141 330 4778 >> Fax: +44 141 330 2792 >> AIM: rod...@ai... >> Facebook: http://www.facebook.com/profile.php?id=1112517192 >> Twitter: http://twitter.com/rdmpage >> Blog: http://iphylo.blogspot.com >> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading, RG6 6BX, United Kingdom > Tel: +44 (0) 118 378 7535 > http://rutgervos.blogspot.com > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |