Re: [pyxb-users] Performance issue
Brought to you by:
pabigot
From: Tim C. <tim...@gm...> - 2009-12-02 14:50:24
|
Well, I do not think it is unfortunate at this point. As a newuser I would rather have accuracy as opposed to speed (though I respect that others may be in a different situation). -Tim On Wed, 2009-12-02 at 06:33 -0700, Peter A. Bigot wrote: > Unfortunately, performance took a back seat to validation for the > current implementation. The example in examples/tmsxtvd has been the > sole performance benchmark so far. On my machine, it shows: > > vmfed9[26]$ python dumpsample.py > Generating binding from tmsdatadirect_sample.xml with minidom > minidom first callSign at None > Generating binding from tmsdatadirect_sample.xml with SAXDOM > SAXDOM first callSign at tmsdatadirect_sample.xml[5:0] > Generating binding from tmsdatadirect_sample.xml with SAX > SAXER first callSign at tmsdatadirect_sample.xml[5:0] > DOM-based read 0.000962, parse 0.391175, bind 10.292386, total 10.683561 > SAXDOM-based parse 1.658077, bind 10.178704, total 11.836781 > SAX-based read 0.000112, parse and bind 10.605082, total 10.605194 > > These are using three different XML back ends to parse the document, but > the same generated bindings and runtime support. As you can see, the > bulk of the time is in checking all the content and putting the values > into Python objects. The test document here is 205 KB in 10 seconds, so > a 6MB document in 90 seconds is faster than I'd thought it might be. > > However, performance is unacceptable for certain applications. There > are a couple approaches. One specifically that I have in mind is to > implement an optimized back end stores values like integers and strings > in native Python form rather than in the subclasses that support > validation. In that case, validation would become a second, optional, > step that you'd have to invoke specifically on each object. The > following is the same program, same bindings, but with: > > pyxb.RequireValidWhenParsing(False) > > set at the top of the script. That option provides an extremely crude > validation bypass, and I can't say it will work correctly in all > situations. However, the results are promising (and better than I'd > expected): > > Generating binding from tmsdatadirect_sample.xml with minidom > minidom first callSign at None > Generating binding from tmsdatadirect_sample.xml with SAXDOM > SAXDOM first callSign at tmsdatadirect_sample.xml[5:0] > Generating binding from tmsdatadirect_sample.xml with SAX > SAXER first callSign at tmsdatadirect_sample.xml[5:0] > DOM-based read 0.001482, parse 0.398322, bind 2.947036, total 3.345358 > SAXDOM-based parse 1.677429, bind 2.689278, total 4.366707 > SAX-based read 0.000217, parse and bind 3.052327, total 3.052544 > > The separate validation step would be something like: > > pyxb.RequireValidWhenParsing(True) > dom_instance.validateBinding() > > (You must reset the RequireValidWhenParsing flag, or the validateBinding > method will immediately succeed.) With this, I get the following > additional time for validation: > > DOM-based validate 1.676465 > SAXDOM-based validate 1.699026 > SAX-based validate 1.710580 > > The fact that generation plus validation is half the time of generation > with validation leaves me skeptical that this is working correctly. > > However, if that option meets your immediate performance needs, and you > can live with either no validation or a second pass, possibly incorrect, > validation, that's the best solution I have right now. If you try it, > please let us know how it affected the speed; and if it breaks please > file a ticket on: http://sourceforge.net/apps/trac/pyxb/ > > I have hopes that a proper optimized back end, with or without > validation, will be available in about three months, but I need to see > whether the folks I originally developed this for are interested in > funding it. > > Peter > > Romain CHANU wrote: > > Hi, > > > > Regarding my last email to the mailing list, I was trying to decide > > whether to use PyXB or generateDS. > > > > As a matter of fact, generateDS does not perform any validation > > against XML schema and had some issues in the creation of the bindings > > for complex schemas. > > > > I am now facing a performance issue with PyXB: I parse and validate a > > 6 Mo file containing XML data. This step takes about 90 seconds... > > > > Is this normal? Any hints to improve this? > > > > Thank you. > > > > Romain Chanu > > ------------------------------------------------------------------------ > > > > ------------------------------------------------------------------------------ > > Join us December 9, 2009 for the Red Hat Virtual Experience, > > a free event focused on virtualization and cloud computing. > > Attend in-depth sessions from your desk. Your couch. Anywhere. > > http://p.sf.net/sfu/redhat-sfdev2dev > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > pyxb-users mailing list > > pyx...@li... > > https://lists.sourceforge.net/lists/listinfo/pyxb-users > > > > > > ------------------------------------------------------------------------------ > Join us December 9, 2009 for the Red Hat Virtual Experience, > a free event focused on virtualization and cloud computing. > Attend in-depth sessions from your desk. Your couch. Anywhere. > http://p.sf.net/sfu/redhat-sfdev2dev > _______________________________________________ > pyxb-users mailing list > pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxb-users -- *************************************************************** Timothy Cook, MSc LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook Skype ID == (upon request) Academic.Edu Profile: http://uff.academia.edu/TimothyCook You may get my Public GPG key from popular keyservers or from this link http://timothywayne.cook.googlepages.com/home |