From: Chinh H. <ho_...@ho...> - 2006-08-02 20:08:23
|
Well, you might read my question below: >>> >>>>Hi Jimmy, >>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>What version of the VTD-XML so that I could try to explore the large XML >>>>file. >>>> I entered a bug on sourceforge website about this. However, I did not stated very clear. The malloc function in C takes an integer. The integer is 4 bits (8 bytes). It's about 8 digits number or 99999999. 2 Gig = 2000000000 or ten digits number. It creates the overflow when it tries to allocate the memory. I think the 20 MB will work OK. >From: "Jimmy Zhang" <cra...@co...> >To: "Chinh Ho" <ho_...@ho...>,<vtd...@li...> >Subject: Re: [Vtd-xml-users] VTD-XML Query >Date: Wed, 2 Aug 2006 12:22:57 -0700 > >Can you first try to parse a smaller document like 20MB to see it works ok >or not? > >I suspect that the file size is getting too big so that it overflows the >32-bit integers, >causing it to intepret is a negative value... >----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >To: <cra...@co...> >Cc: <vtd...@li...> >Sent: Wednesday, August 02, 2006 10:19 AM >Subject: Re: [Vtd-xml-users] VTD-XML Query > > >>I used version 1.6. Here are the steps that I did: >>1/ Download the ximpleware_1.6_c_light and extract them out to a folder >>named "ximpleware". >>2/ Open MS Visual Studio 2005. >>3/ Open new empty C++ Win32 Console Application project. >>4/ Open the "ximpleware" folder. Select all files. Drag and drop them to >>the Solution Explorer window in MS 2005. >>5/ Click build solution. >>6/ Copy the 2 Gig xml file to the debug folder. >>7/ Open the benchmark_vtdxml.c . Comment out the int argc and char >>*argv[] in main() >>8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml >>file name. >>9/ Replace the argv[1] in the next line with the same xml file name: >>(stat(argv[1], &s)) >>10/ Put breakpoints at the line f = fopen("foo.xml", "r"); >> and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); >>11/ Press F5. >>12/ On the Autos window, it shows the s.st_size is -858993460. >> On the cmd window, it shows the same size of the file : "size of the >>file is -858993460" >>13/ Press F10 twice. >>14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion >>Failed! Program:... File: fread.c Line: 93 Expression: (buffer >>!= NULL) >> >>Please let me know what step(s) that I did wrong. Also, how do you turn >>off the namespace support when parsing. >>I could not use the ximpleware_1.6_c because of the .l and .y files. I >>think these files are for the Unix version, aren't they? >> >> >> >> >> >>>From: "Jimmy Zhang" <cra...@co...> >>>To: "Chinh Ho" <ho_...@ho...> >>>CC: <vtd...@li...> >>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>Date: Mon, 31 Jul 2006 11:52:48 -0700 >>> >>>Version 1.6, when you turn off namespace support when parsing... >>>the max is 1GB when namespace enabled... >>> >>>also don't forget to CC vtd-xml-user to keep a record >>>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>>To: <cra...@co...> >>>Sent: Monday, July 31, 2006 11:50 AM >>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>> >>> >>>>Hi Jimmy, >>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>What version of the VTD-XML so that I could try to explore the large XML >>>>file. >>>> >>>> >>>> >>>> >>>>>From: "Jimmy Zhang" <cra...@co...> >>>>>To: <vtd...@li...> >>>>>CC: Din Sush <di...@ya...> >>>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>>>> >>>>>I think VTD-XML should have a couple of distinct advantages for >>>>>splitting >>>>>XML, performance >>>>>probably being the biggest reason... currently VTD-XML's file size >>>>>support >>>>>is 2GB, and you need >>>>>to have enough memory to hold the document in memory... >>>>> >>>>>I haven't tried other approaches, but they seem like SAX based, and may >>>>>be >>>>>slower and less flexible >>>>>(SAX is forward only), >>>>> >>>>>Let me know if there are any questions... you are welcome to share your >>>>>experience with us >>>>> >>>>>----- Original Message ----- >>>>>From: "Din Sush" <di...@ya...> >>>>>To: <vtd...@li...> >>>>>Sent: Monday, July 31, 2006 5:23 AM >>>>>Subject: [Vtd-xml-users] VTD-XML Query >>>>> >>>>> >>>>> > Here is my requirement >>>>> > >>>>> > I need to split really big XML files(1 GB plus) into >>>>> > smaller sized files. >>>>> > I am in the process of evaluating different >>>>> > approaches. >>>>> > 1. Use Vtd-XML, parse and split. >>>>> > 2. Use Perl XML::Twig split function >>>>> > 3. Writing my own parser in perl on top of >>>>> > XML::Parser, >>>>> > which uses expat. >>>>> > 4. Use libxml2. >>>>> > >>>>> > I am not sure if this is the right place to post this >>>>> > question, but would like to know the best approach to >>>>> > get the job done effectively. >>>>> > >>>>> > I would like to know the pros/cons and limitations of >>>>> > my proposed solutions. >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > __________________________________________________ >>>>> > Do You Yahoo!? >>>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>>> > http://mail.yahoo.com >>>>> > >>>>> > >>>>>------------------------------------------------------------------------- >>>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>>>share >>>>> > your >>>>> > opinions on IT & business topics through brief surveys -- and earn >>>>>cash >>>>> > >>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>> > _______________________________________________ >>>>> > Vtd-xml-users mailing list >>>>> > Vtd...@li... >>>>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>>> > >>>>> >>>>> >>>>> >>>>>------------------------------------------------------------------------- >>>>>Take Surveys. Earn Cash. Influence the Future of IT >>>>>Join SourceForge.net's Techsay panel and you'll get the chance to share >>>>>your >>>>>opinions on IT & business topics through brief surveys -- and earn cash >>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>>_______________________________________________ >>>>>Vtd-xml-users mailing list >>>>>Vtd...@li... >>>>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>> >>>> >>>> >>> >>> >> >> >> > > |