From: Tatu S. <cow...@ya...> - 2006-08-03 18:19:06
|
--- Din Sush <di...@ya...> wrote: > I tried woodstox parser, it seems to be working and > for a 1 GB file it is taking around 11 mins to split > the file in multiple 1 MB files. Hmmh. That sounds bit slow, for typical disks and all (with maybe 30 MBps read speed, and bit higher write speed). I'd expect it to take roughly maybe a minute or so. Can you share the code? I would profile it (just using 'java -Xrunhprof:cpu=samples', running the code for a minute or so). > Thanks for your suggestion!! I was just wondering if > I > can make it any faster, I am using > "copyEventFromEventMethod" to write to the file. I guess it all depends on code in question (and possibly file in question might affect speed a bit, shouldn't matter very much though). Can you send the code? I could test it against test files I have created. -+ Tatu +- > > Thanks again. > > --- Tatu Saloranta <cow...@ya...> wrote: > > > --- Din Sush <di...@ya...> wrote: > > > > > Well I only need to split the document and don't > > > need > > > to go back to parsed document, and I don't need > > DOM > > > like functionality. > > > > > > Will VTD-XML be still better in this scenario. > > > > I would suggest that if you do have time, you > > investigate both using VTD-XML, and a Stax > > implementation (such as > > http://woodstox.codehaus.org). > > My feeling is that it all comes down to which one > > API > > you feel more comfortable with, or perhaps whether > > have to use a xml-compliant standard-based > solution > > or > > not. > > Both can perform well enough, assuming you are not > > limited by VTD-XML due to main memory > requirements. > > Stax memory usage is not linear with document > > length, > > so there are no practical input size limitations. > > > > If you do end up both approaches, it would be very > > nice to get the performance numbers, since this > > would > > be an actual real-world use case, instead of > > benchmarks. Plus if code is simple enough, perhaps > > it > > could become a benchmark for these types of > > operations? > > > > > Secondly as the entire document needs to be > loaded > > > in > > > the memory, the whole idea of splitting is that > I > > am > > > getting "Out of Memory" error won't I get the > same > > > error when I am using VTD-XML, than it kind of > > > defeats > > > the purpose. Correct me if I am wrong in the > > > interpretation as I have never used VTD. > > > > You are correct here. While limit is much higher > > than > > with, say, DOM (2x or perhaps 3x), there is a > limit. > > > > -+ Tatu +- > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > > protection around > > http://mail.yahoo.com > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get > the chance to share your > opinions on IT & business topics through brief > surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |