From: Jimmy Z. <cra...@co...> - 2006-08-02 21:02:47
|
yeah, I responded the request, can you somehow split The XML file into smaller chunks?? ----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> To: <cra...@co...>; <vtd...@li...> Sent: Wednesday, August 02, 2006 1:08 PM Subject: Re: [Vtd-xml-users] VTD-XML Query > Well, you might read my question below: >>>> >>>>>Hi Jimmy, >>>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>>What version of the VTD-XML so that I could try to explore the large >>>>>XML file. >>>>> > > I entered a bug on sourceforge website about this. However, I did not > stated very clear. The malloc function in C takes an integer. The > integer is 4 bits (8 bytes). It's about 8 digits number or 99999999. > > 2 Gig = 2000000000 or ten digits number. > > It creates the overflow when it tries to allocate the memory. > > I think the 20 MB will work OK. > > > > >>From: "Jimmy Zhang" <cra...@co...> >>To: "Chinh Ho" >><ho_...@ho...>,<vtd...@li...> >>Subject: Re: [Vtd-xml-users] VTD-XML Query >>Date: Wed, 2 Aug 2006 12:22:57 -0700 >> >>Can you first try to parse a smaller document like 20MB to see it works ok >>or not? >> >>I suspect that the file size is getting too big so that it overflows the >>32-bit integers, >>causing it to intepret is a negative value... >>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>To: <cra...@co...> >>Cc: <vtd...@li...> >>Sent: Wednesday, August 02, 2006 10:19 AM >>Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >>>I used version 1.6. Here are the steps that I did: >>>1/ Download the ximpleware_1.6_c_light and extract them out to a folder >>>named "ximpleware". >>>2/ Open MS Visual Studio 2005. >>>3/ Open new empty C++ Win32 Console Application project. >>>4/ Open the "ximpleware" folder. Select all files. Drag and drop them >>>to the Solution Explorer window in MS 2005. >>>5/ Click build solution. >>>6/ Copy the 2 Gig xml file to the debug folder. >>>7/ Open the benchmark_vtdxml.c . Comment out the int argc and char >>>*argv[] in main() >>>8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml >>>file name. >>>9/ Replace the argv[1] in the next line with the same xml file name: >>>(stat(argv[1], &s)) >>>10/ Put breakpoints at the line f = fopen("foo.xml", "r"); >>> and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); >>>11/ Press F5. >>>12/ On the Autos window, it shows the s.st_size is -858993460. >>> On the cmd window, it shows the same size of the file : "size of >>> the file is -858993460" >>>13/ Press F10 twice. >>>14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion >>>Failed! Program:... File: fread.c Line: 93 Expression: >>>(buffer != NULL) >>> >>>Please let me know what step(s) that I did wrong. Also, how do you turn >>>off the namespace support when parsing. >>>I could not use the ximpleware_1.6_c because of the .l and .y files. I >>>think these files are for the Unix version, aren't they? >>> >>> >>> >>> >>> >>>>From: "Jimmy Zhang" <cra...@co...> >>>>To: "Chinh Ho" <ho_...@ho...> >>>>CC: <vtd...@li...> >>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>Date: Mon, 31 Jul 2006 11:52:48 -0700 >>>> >>>>Version 1.6, when you turn off namespace support when parsing... >>>>the max is 1GB when namespace enabled... >>>> >>>>also don't forget to CC vtd-xml-user to keep a record >>>>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>>>To: <cra...@co...> >>>>Sent: Monday, July 31, 2006 11:50 AM >>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>> >>>> >>>>>Hi Jimmy, >>>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>>What version of the VTD-XML so that I could try to explore the large >>>>>XML file. >>>>> >>>>> >>>>> >>>>> >>>>>>From: "Jimmy Zhang" <cra...@co...> >>>>>>To: <vtd...@li...> >>>>>>CC: Din Sush <di...@ya...> >>>>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>>>>> >>>>>>I think VTD-XML should have a couple of distinct advantages for >>>>>>splitting >>>>>>XML, performance >>>>>>probably being the biggest reason... currently VTD-XML's file size >>>>>>support >>>>>>is 2GB, and you need >>>>>>to have enough memory to hold the document in memory... >>>>>> >>>>>>I haven't tried other approaches, but they seem like SAX based, and >>>>>>may be >>>>>>slower and less flexible >>>>>>(SAX is forward only), >>>>>> >>>>>>Let me know if there are any questions... you are welcome to share >>>>>>your >>>>>>experience with us >>>>>> >>>>>>----- Original Message ----- >>>>>>From: "Din Sush" <di...@ya...> >>>>>>To: <vtd...@li...> >>>>>>Sent: Monday, July 31, 2006 5:23 AM >>>>>>Subject: [Vtd-xml-users] VTD-XML Query >>>>>> >>>>>> >>>>>> > Here is my requirement >>>>>> > >>>>>> > I need to split really big XML files(1 GB plus) into >>>>>> > smaller sized files. >>>>>> > I am in the process of evaluating different >>>>>> > approaches. >>>>>> > 1. Use Vtd-XML, parse and split. >>>>>> > 2. Use Perl XML::Twig split function >>>>>> > 3. Writing my own parser in perl on top of >>>>>> > XML::Parser, >>>>>> > which uses expat. >>>>>> > 4. Use libxml2. >>>>>> > >>>>>> > I am not sure if this is the right place to post this >>>>>> > question, but would like to know the best approach to >>>>>> > get the job done effectively. >>>>>> > >>>>>> > I would like to know the pros/cons and limitations of >>>>>> > my proposed solutions. >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > __________________________________________________ >>>>>> > Do You Yahoo!? >>>>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>>>> > http://mail.yahoo.com >>>>>> > >>>>>> > >>>>>>------------------------------------------------------------------------- >>>>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>>>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>>>>share >>>>>> > your >>>>>> > opinions on IT & business topics through brief surveys -- and earn >>>>>>cash >>>>>> > >>>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>>> > _______________________________________________ >>>>>> > Vtd-xml-users mailing list >>>>>> > Vtd...@li... >>>>>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>>------------------------------------------------------------------------- >>>>>>Take Surveys. Earn Cash. Influence the Future of IT >>>>>>Join SourceForge.net's Techsay panel and you'll get the chance to >>>>>>share your >>>>>>opinions on IT & business topics through brief surveys -- and earn >>>>>>cash >>>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>>>_______________________________________________ >>>>>>Vtd-xml-users mailing list >>>>>>Vtd...@li... >>>>>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > > |