Re: [Vtd-xml-users] VTD-XML Query

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

How much memory do you have for your machine?
Can you increase heap size to 1500m ??
----- Original Message ----- 
From: "Din Sush" <di...@ya...>
To: "Jimmy Zhang" <cra...@co...>; 
<vtd...@li...>
Sent: Thursday, August 03, 2006 4:32 AM
Subject: Re: [Vtd-xml-users] VTD-XML Query


>I increased the JVM heap size but even after that I
> was getting "out of memory" error of 800 MB data.
>
> --- Jimmy Zhang <cra...@co...> wrote:
>
>> For parsing large XML files, make sure you set the
>> maximum JVM heap size to
>> a bigger enough value
>>
>> I think the command is "java -server -Xmx600m
>> <yourclass>.class."
>>
>> ----- Original Message ----- 
>> From: "Din Sush" <di...@ya...>
>> To: "Jimmy Zhang" <cra...@co...>;
>> <vtd...@li...>
>> Sent: Tuesday, August 01, 2006 1:54 AM
>> Subject: Re: [Vtd-xml-users] VTD-XML Query
>>
>>
>> >
>> > Hi,
>> >
>> > We are having problems parsing an XML file of size
>> > 250MB.
>> >
>> > I got an "Out of memory error" for
>> >
>> > // open a file and read the content into a byte
>> array
>> >
>> > File f = new File("./servers.xml");
>> >
>> > FileInputStream fis = new FileInputStream(f);
>> >
>> > On this line
>> >
>> > >>>> byte[] b = new byte[(int) f.length()];
>> >
>> > This code can be found on
>> > http://vtd-xml.sourceforge.net/codeSample/cs1.html
>> >
>> > And it is not related to the file splitting. It
>> fails
>> > during initialization itself.
>> >
>> > Any thoughts on that!!
>> >
>> >
>> > --- Jimmy Zhang <cra...@co...> wrote:
>> >
>> >> Usually the out of memory happens when you parses
>> >> the file into a DOM
>> >> tree...
>> >>
>> >> Assuming that you have enough memory to hold the
>> >> document in memory, VTD-XML
>> >> should
>> >> compare very favorably against SAX or Pull in
>> terms
>> >> of coding effort and
>> >> performance...
>> >> even you don't need to go back parsed document
>> and
>> >> don't care about DOM like
>> >> functionalites
>> >> ----- Original Message ----- 
>> >> From: "Din Sush" <di...@ya...>
>> >> To: "Jimmy Zhang" <cra...@co...>; "Tatu
>> >> Saloranta"
>> >> <cow...@ya...>;
>> >> <vtd...@li...>
>> >> Sent: Monday, July 31, 2006 8:20 PM
>> >> Subject: Re: [Vtd-xml-users] VTD-XML Query
>> >>
>> >>
>> >> > Well I only need to split the document and
>> don't
>> >> need
>> >> > to go back to parsed document, and I don't need
>> >> DOM
>> >> > like functionality.
>> >> >
>> >> > Will VTD-XML be still better in this scenario.
>> >> >
>> >> > Secondly as the entire document needs to be
>> loaded
>> >> in
>> >> > the memory, the whole idea of splitting is that
>> I
>> >> am
>> >> > getting "Out of Memory" error won't I get the
>> same
>> >> > error when I am using VTD-XML, than it kind of
>> >> defeats
>> >> > the purpose. Correct me if I am wrong in the
>> >> > interpretation as I have never used VTD.
>> >> >
>> >> >
>> >> >
>> >> > --- Jimmy Zhang <cra...@co...> wrote:
>> >> >
>> >> >> Well, the problem with streaming approach is
>> that
>> >> >> you will need to parse
>> >> >> then reserialize,
>> >> >> both CPU intensive, with VTD-XML it becomes a
>> lot
>> >> >> more efficient, but you
>> >> >> need to load
>> >> >> the document in memory first, so there first
>> has
>> >> to
>> >> >> be enough memory
>> >> >> available... but on the
>> >> >> other hand, using steaming API like SAX or
>> PULL,
>> >> you
>> >> >> will need to read in
>> >> >> the document
>> >> >> piecewise anyway, so overall I think VTD-XML
>> >> should
>> >> >> win quite
>> >> >> significantly...
>> >> >>
>> >> >> My view of VTD-XML is that it is just like
>> DOM,
>> >> you
>> >> >> can jump back and forth
>> >> >> as often
>> >> >> as you want... yet it parses a lot faster than
>> >> >> DOM...
>> >> >> ----- Original Message ----- 
>> >> >> From: "Tatu Saloranta"
>> <cow...@ya...>
>> >> >> To: "Din Sush" <di...@ya...>;
>> >> >> <vtd...@li...>
>> >> >> Sent: Monday, July 31, 2006 11:53 AM
>> >> >> Subject: Re: [Vtd-xml-users] VTD-XML Query
>> >> >>
>> >> >>
>> >> >> > --- Din Sush <di...@ya...> wrote:
>> >> >> >
>> >> >> >> Here is my requirement
>> >> >> >>
>> >> >> >> I need to split really big XML files(1 GB
>> >> plus)
>> >> >> into
>> >> >> >> smaller sized files.
>> >> >> >> I am in the process of evaluating different
>> >> >> >> approaches.
>> >> >> >> 1. Use Vtd-XML, parse and split.
>> >> >> >> 2. Use Perl XML::Twig split function
>> >> >> >> 3. Writing my own parser in perl on top of
>> >> >> >> XML::Parser,
>> >> >> >> which uses expat.
>> >> >> >> 4. Use libxml2.
>> >> >> >
>> >> >> > To me, this does sound like you would be
>> better
>> >> >> off
>> >> >> > using a streaming approach (SAX, StAX or
>> >> XmlPull;
>> >> >> or
>> >> >> > .net equivalent of the last 2; StAX and
>> XmlPull
>> >> >> are
>> >> >> > Java things). I don't know if there are
>> >> perl-basd
>> >> >> > streaming equivalents, but I think expat and
>> >> >> libxml2
>> >> >> > have streaming SAX interfaces (or similar)
>> >> >> >
>> >> >> > There doesn't seem to be much need for
>> random
>> >> >> access,
>> >> >> > nor need to keep any portions in memory.
>> >> Streaming
>> >> >> > approaches have no problem with files of any
>> >> size
>> >> >> > (certainly no problems with 1 GB), and for
>> >> >> splitting I
>> >> >> > personally do not think VTD-XML would be
>> faster
>> >> >> than
>> >> >> > the alternatives. This because all the
>> content
>> >> has
>> >> >> to
>> >> >> > be accessed -- VTD-XML is fastest when you
>> need
>> >> to
>> >> >> > access as little data as possible.
>> >> >> >
>> >> >> > -+ Tatu +-
>> >> >> >
>> >> >> >
>> >> >> >
>> >>
>> __________________________________________________
>> >> >> > Do You Yahoo!?
>> >> >> > Tired of spam?  Yahoo! Mail has the best
>> spam
>> >> >> protection around
>> >> >> > http://mail.yahoo.com
>> >> >> >
>>
> === message truncated ===
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>