From: Rutger V. <R....@re...> - 2011-04-18 12:40:28
|
Yeah, I know, some of the studies are serialized incorrectly, especially the ones with "mixed" data containing both DNA and categorical data in the same matrix, or unusual state definitions in some other way. This results in a character state set definition being written out for every matrix column, and that takes up most of the file. Another thing is that we're now using owl:sameAs statements to specify the TreeBASE ID for every character. There are a number of these issues, they're bugs, I'm recording them - it's one of the things we should be fixing during Laurel's project. A correctly formatted NeXML file is going to be bigger than the equivalent NEXUS file, but perhaps like a factor of ten or so max, depending on the amount of metadata (i.e. on the order of 1Mb for S2012). That is a trade-off that is worth it because it will allow us to export all the metadata in a single file. 13.7 Mb is obviously wrong. On Mon, Apr 18, 2011 at 1:03 PM, Roderic Page <r....@bi...> wrote: > I've started trying again to harvest individual Nexml files, and it's still unbelievably slow. We're talking minutes for a study in some cases. The XML for S2012 took about 5 minutes to fetch and is 13.7 Mb in size(!). The NEXUS file is 164Kb. > > Need I say more...? > > Regards > > Rod > > On 15 Apr 2011, at 13:42, William Piel wrote: > >> >> On Apr 15, 2011, at 4:14 AM, Roderic Page wrote: >> >>> For large studies the Nexml generation simply times out, so I gave up. >> >> If you still have some ID numbers for those big ones, I'd be happy to test it again. It may have been solved because of some recent changes. >> >> But, indeed, I'd like access to a dump too. >> >> bp >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > --------------------------------------------------------- > Roderic Page > Professor of Taxonomy > Institute of Biodiversity, Animal Health and Comparative Medicine > College of Medical, Veterinary and Life Sciences > Graham Kerr Building > University of Glasgow > Glasgow G12 8QQ, UK > > Email: r....@bi... > Tel: +44 141 330 4778 > Fax: +44 141 330 2792 > AIM: rod...@ai... > Facebook: http://www.facebook.com/profile.php?id=1112517192 > Twitter: http://twitter.com/rdmpage > Blog: http://iphylo.blogspot.com > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > > > > > > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |