From: William F. D. <wil...@th...> - 2005-04-15 16:14:04
|
On Sat, 2005-03-26 at 16:38 +0100, Martin Quinson wrote: > On Fri, Mar 25, 2005 at 03:44:23PM -0500, William F. Dowling wrote: > > Hi Martin, > > > > I have a question for you. Have you ever used xslt? What do you think > > of it? I am working on a presentation (sort of a sales pitch) for some > > people I work with, on flexml. Part of my talk will be a comparison > > with xslt. FWIW I did a comparison between flexml and xsltproc. The test was to extract the pcdata from a couple different elements. The DTD has ~ 100 elements, a similar total number of attributes, and (I don't have a specific measurement here) is not very complex -- generally simple content models. The "document" was 128MB; the root element in the DTD is <!ELEMENT db (header | issue | item | ref)+> and there are about 150K (header | issue | item | ref) elements in the test document. Two points: xsltproc was >200 times slower on my box for this task than what flexml gave; and the flexml-generated program was 20 times slower than an egrep scan. Re the latter point -- I suspect the problem is inherent in the flexml parse model. That *I assume* is LL(1), correct? Why else would I have had to increase my stack size to 20M (10M was not big enough for my file)? Is there an LR or LALR parser for XML out there? Will stats ----- # get a baseline -- scan the file with egrep to find authors egrep '<(primary)?author[^s]' /proj/data/WoS.2004000109 > xxx 0.38s user 0.38s system 93% cpu 0.814 total # use flexml-generated 'authors' program ./authors < /proj/data/WoS.2004000109 > xxx 7.15s user 0.85s system 93% cpu 8.521 total # use comparable xslt stylesheet xsltproc -o xxx testauth.xsl /proj/data/WoS.2004000109 1618.41s user 11.26s system 63% cpu 42:52.44 total # How big was that input, anyway? wc /proj/data/WoS.2004000109 3098636 10465719 128457959 /proj/data/WoS.2004000109 -- William F Dowling wil...@th... www.isinet.com |