Re: [Treebase-devel] NEXML export really, really slow

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Wed, Dec 15, 2010 at 7:59 AM, Roderic Page <r....@bi...> wrote:
>
>
> On 15 Dec 2010, at 10:45, Rutger Vos wrote:
>
>> Can I quote that in the endorsement section on the NeXML website? i.e.:
>>
>> "pretty unpleasant to deal with" -- Rod Page, Glasgow
>
> XML is one way to serialise data, and it can be verbose and ugly. OK, pretty
> much anytime it's used things get verbose and ugly. I totally get the
> rationale for using it, and NeXML has some nice features, I'm just not a big
> fan of XML. It makes sense in document mark-up where structure and order
> matter (e.g., the NLM mark-up for journal articles) but for moving objects
> around it is a pain. JSON makes life a lot easier (and by this I don't mean
> some JSON version of an XML document).

I was just teasing, not expecting a serious response. I'm well aware
of the pros and cons of XML; there are certainly some weird, weird
design choices in there (for starters, why do we need closing tags
anyway?) but for many use cases, on balance, it's the best we have
right now. In the context you're interested in, with couchdb and
things happening inside a browser, JSON is obviously much nicer to
work with.

>> On the matter at hand, a reverse proxy that caches anything static
>> would be great to have, agreed. Is squid still the thing to use?
>
> Call me old fashioned, but why not just dump the NexML files to disk,
> updating if and when source data changes? You've a couple of thousand
> documents, most will be rarely used, but it's the first time the document is
> created that is the killer. I've no experience with reverse proxies, but
> presumably the document has to exist before it can be cached, and the issue
> here is the time it takes to create the document, not serve it.

Yeah, that's probably easier. Generate them all once a month (+make
any new ones from new submissions as they are encountered) and return
those on request. Also, zip them up, generate an MD5 checksum and
that's your dump.

>> On Wednesday, December 15, 2010, Roderic Page <r....@bi...>
>> wrote:
>>>
>>> I've been downloading NEXML files from TreeBASE with a view to making
>>> a local copy in CouchDB. TreeBASE NEXML is pretty unpleasant to deal
>>> with, but it does give me a complete summary of a study.
>>>
>>> However, generating the NEXML file can take what seems like an age,
>>> particularly for large data sets. I've written a script to harvest
>>> NEXML for each study, but this regularly times out. Is there anyway
>>> this could be speed up, perhaps by having TreeBASE cache NEXML files
>>> so users are grabbing a text document, not forcing live queries to the
>>> database?
>>>
>>> Regards
>>>
>>> Rod
>>> ---------------------------------------------------------
>>> Roderic Page
>>> Professor of Taxonomy
>>> Institute of Biodiversity, Animal Health and Comparative Medicine
>>> College of Medical, Veterinary and Life Sciences
>>> Graham Kerr Building
>>> University of Glasgow
>>> Glasgow G12 8QQ, UK
>>>
>>> Email: r....@bi...
>>> Tel: +44 141 330 4778
>>> Fax: +44 141 330 2792
>>> AIM: rod...@ai...
>>> Facebook: http://www.facebook.com/profile.php?id=1112517192
>>> Twitter: http://twitter.com/rdmpage
>>> Blog: http://iphylo.blogspot.com
>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Lotusphere 2011
>>> Register now for Lotusphere 2011 and learn how
>>> to connect the dots, take your collaborative environment
>>> to the next level, and enter the era of Social Business.
>>> http://p.sf.net/sfu/lotusphere-d2d
>>> _______________________________________________
>>> Treebase-devel mailing list
>>> Tre...@li...
>>> https://lists.sourceforge.net/lists/listinfo/treebase-devel
>>>
>>
>> --
>> Dr. Rutger A. Vos
>> School of Biological Sciences
>> Philip Lyle Building, Level 4
>> University of Reading
>> Reading
>> RG6 6BX
>> United Kingdom
>> Tel: +44 (0) 118 378 7535
>> http://www.nexml.org
>> http://rutgervos.blogspot.com
>>
>
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
>
> Email: r....@bi...
> Tel: +44 141 330 4778
> Fax: +44 141 330 2792
> AIM: rod...@ai...
> Facebook: http://www.facebook.com/profile.php?id=1112517192
> Twitter: http://twitter.com/rdmpage
> Blog: http://iphylo.blogspot.com
> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>
>
>
>
>
>
>
>

-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading
RG6 6BX
United Kingdom
Tel: +44 (0) 118 378 7535
http://www.nexml.org
http://rutgervos.blogspot.com