From: Jimmy Z. <cra...@co...> - 2006-07-31 19:24:08
|
Well, the problem with streaming approach is that you will need to parse then reserialize, both CPU intensive, with VTD-XML it becomes a lot more efficient, but you need to load the document in memory first, so there first has to be enough memory available... but on the other hand, using steaming API like SAX or PULL, you will need to read in the document piecewise anyway, so overall I think VTD-XML should win quite significantly... My view of VTD-XML is that it is just like DOM, you can jump back and forth as often as you want... yet it parses a lot faster than DOM... ----- Original Message ----- From: "Tatu Saloranta" <cow...@ya...> To: "Din Sush" <di...@ya...>; <vtd...@li...> Sent: Monday, July 31, 2006 11:53 AM Subject: Re: [Vtd-xml-users] VTD-XML Query > --- Din Sush <di...@ya...> wrote: > >> Here is my requirement >> >> I need to split really big XML files(1 GB plus) into >> smaller sized files. >> I am in the process of evaluating different >> approaches. >> 1. Use Vtd-XML, parse and split. >> 2. Use Perl XML::Twig split function >> 3. Writing my own parser in perl on top of >> XML::Parser, >> which uses expat. >> 4. Use libxml2. > > To me, this does sound like you would be better off > using a streaming approach (SAX, StAX or XmlPull; or > .net equivalent of the last 2; StAX and XmlPull are > Java things). I don't know if there are perl-basd > streaming equivalents, but I think expat and libxml2 > have streaming SAX interfaces (or similar) > > There doesn't seem to be much need for random access, > nor need to keep any portions in memory. Streaming > approaches have no problem with files of any size > (certainly no problems with 1 GB), and for splitting I > personally do not think VTD-XML would be faster than > the alternatives. This because all the content has to > be accessed -- VTD-XML is fastest when you need to > access as little data as possible. > > -+ Tatu +- > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > |