From: Jimmy Z. <cra...@co...> - 2006-08-03 16:17:52
|
How much memory do you have for your machine? Can you increase heap size to 1500m ?? ----- Original Message ----- From: "Din Sush" <di...@ya...> To: "Jimmy Zhang" <cra...@co...>; <vtd...@li...> Sent: Thursday, August 03, 2006 4:32 AM Subject: Re: [Vtd-xml-users] VTD-XML Query >I increased the JVM heap size but even after that I > was getting "out of memory" error of 800 MB data. > > --- Jimmy Zhang <cra...@co...> wrote: > >> For parsing large XML files, make sure you set the >> maximum JVM heap size to >> a bigger enough value >> >> I think the command is "java -server -Xmx600m >> <yourclass>.class." >> >> ----- Original Message ----- >> From: "Din Sush" <di...@ya...> >> To: "Jimmy Zhang" <cra...@co...>; >> <vtd...@li...> >> Sent: Tuesday, August 01, 2006 1:54 AM >> Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >> > >> > Hi, >> > >> > We are having problems parsing an XML file of size >> > 250MB. >> > >> > I got an "Out of memory error" for >> > >> > // open a file and read the content into a byte >> array >> > >> > File f = new File("./servers.xml"); >> > >> > FileInputStream fis = new FileInputStream(f); >> > >> > On this line >> > >> > >>>> byte[] b = new byte[(int) f.length()]; >> > >> > This code can be found on >> > http://vtd-xml.sourceforge.net/codeSample/cs1.html >> > >> > And it is not related to the file splitting. It >> fails >> > during initialization itself. >> > >> > Any thoughts on that!! >> > >> > >> > --- Jimmy Zhang <cra...@co...> wrote: >> > >> >> Usually the out of memory happens when you parses >> >> the file into a DOM >> >> tree... >> >> >> >> Assuming that you have enough memory to hold the >> >> document in memory, VTD-XML >> >> should >> >> compare very favorably against SAX or Pull in >> terms >> >> of coding effort and >> >> performance... >> >> even you don't need to go back parsed document >> and >> >> don't care about DOM like >> >> functionalites >> >> ----- Original Message ----- >> >> From: "Din Sush" <di...@ya...> >> >> To: "Jimmy Zhang" <cra...@co...>; "Tatu >> >> Saloranta" >> >> <cow...@ya...>; >> >> <vtd...@li...> >> >> Sent: Monday, July 31, 2006 8:20 PM >> >> Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >> >> >> >> > Well I only need to split the document and >> don't >> >> need >> >> > to go back to parsed document, and I don't need >> >> DOM >> >> > like functionality. >> >> > >> >> > Will VTD-XML be still better in this scenario. >> >> > >> >> > Secondly as the entire document needs to be >> loaded >> >> in >> >> > the memory, the whole idea of splitting is that >> I >> >> am >> >> > getting "Out of Memory" error won't I get the >> same >> >> > error when I am using VTD-XML, than it kind of >> >> defeats >> >> > the purpose. Correct me if I am wrong in the >> >> > interpretation as I have never used VTD. >> >> > >> >> > >> >> > >> >> > --- Jimmy Zhang <cra...@co...> wrote: >> >> > >> >> >> Well, the problem with streaming approach is >> that >> >> >> you will need to parse >> >> >> then reserialize, >> >> >> both CPU intensive, with VTD-XML it becomes a >> lot >> >> >> more efficient, but you >> >> >> need to load >> >> >> the document in memory first, so there first >> has >> >> to >> >> >> be enough memory >> >> >> available... but on the >> >> >> other hand, using steaming API like SAX or >> PULL, >> >> you >> >> >> will need to read in >> >> >> the document >> >> >> piecewise anyway, so overall I think VTD-XML >> >> should >> >> >> win quite >> >> >> significantly... >> >> >> >> >> >> My view of VTD-XML is that it is just like >> DOM, >> >> you >> >> >> can jump back and forth >> >> >> as often >> >> >> as you want... yet it parses a lot faster than >> >> >> DOM... >> >> >> ----- Original Message ----- >> >> >> From: "Tatu Saloranta" >> <cow...@ya...> >> >> >> To: "Din Sush" <di...@ya...>; >> >> >> <vtd...@li...> >> >> >> Sent: Monday, July 31, 2006 11:53 AM >> >> >> Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >> >> >> >> >> >> >> > --- Din Sush <di...@ya...> wrote: >> >> >> > >> >> >> >> Here is my requirement >> >> >> >> >> >> >> >> I need to split really big XML files(1 GB >> >> plus) >> >> >> into >> >> >> >> smaller sized files. >> >> >> >> I am in the process of evaluating different >> >> >> >> approaches. >> >> >> >> 1. Use Vtd-XML, parse and split. >> >> >> >> 2. Use Perl XML::Twig split function >> >> >> >> 3. Writing my own parser in perl on top of >> >> >> >> XML::Parser, >> >> >> >> which uses expat. >> >> >> >> 4. Use libxml2. >> >> >> > >> >> >> > To me, this does sound like you would be >> better >> >> >> off >> >> >> > using a streaming approach (SAX, StAX or >> >> XmlPull; >> >> >> or >> >> >> > .net equivalent of the last 2; StAX and >> XmlPull >> >> >> are >> >> >> > Java things). I don't know if there are >> >> perl-basd >> >> >> > streaming equivalents, but I think expat and >> >> >> libxml2 >> >> >> > have streaming SAX interfaces (or similar) >> >> >> > >> >> >> > There doesn't seem to be much need for >> random >> >> >> access, >> >> >> > nor need to keep any portions in memory. >> >> Streaming >> >> >> > approaches have no problem with files of any >> >> size >> >> >> > (certainly no problems with 1 GB), and for >> >> >> splitting I >> >> >> > personally do not think VTD-XML would be >> faster >> >> >> than >> >> >> > the alternatives. This because all the >> content >> >> has >> >> >> to >> >> >> > be accessed -- VTD-XML is fastest when you >> need >> >> to >> >> >> > access as little data as possible. >> >> >> > >> >> >> > -+ Tatu +- >> >> >> > >> >> >> > >> >> >> > >> >> >> __________________________________________________ >> >> >> > Do You Yahoo!? >> >> >> > Tired of spam? Yahoo! Mail has the best >> spam >> >> >> protection around >> >> >> > http://mail.yahoo.com >> >> >> > >> > === message truncated === > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > |