From: Jimmy Z. <cra...@co...> - 2006-08-02 19:23:14
|
Can you first try to parse a smaller document like 20MB to see it works ok or not? I suspect that the file size is getting too big so that it overflows the 32-bit integers, causing it to intepret is a negative value... ----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> To: <cra...@co...> Cc: <vtd...@li...> Sent: Wednesday, August 02, 2006 10:19 AM Subject: Re: [Vtd-xml-users] VTD-XML Query >I used version 1.6. Here are the steps that I did: > 1/ Download the ximpleware_1.6_c_light and extract them out to a folder > named "ximpleware". > 2/ Open MS Visual Studio 2005. > 3/ Open new empty C++ Win32 Console Application project. > 4/ Open the "ximpleware" folder. Select all files. Drag and drop them to > the Solution Explorer window in MS 2005. > 5/ Click build solution. > 6/ Copy the 2 Gig xml file to the debug folder. > 7/ Open the benchmark_vtdxml.c . Comment out the int argc and char > *argv[] in main() > 8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml > file name. > 9/ Replace the argv[1] in the next line with the same xml file name: > (stat(argv[1], &s)) > 10/ Put breakpoints at the line f = fopen("foo.xml", "r"); > and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); > 11/ Press F5. > 12/ On the Autos window, it shows the s.st_size is -858993460. > On the cmd window, it shows the same size of the file : "size of the > file is -858993460" > 13/ Press F10 twice. > 14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion > Failed! Program:... File: fread.c Line: 93 Expression: (buffer > != NULL) > > Please let me know what step(s) that I did wrong. Also, how do you turn > off the namespace support when parsing. > I could not use the ximpleware_1.6_c because of the .l and .y files. I > think these files are for the Unix version, aren't they? > > > > > >>From: "Jimmy Zhang" <cra...@co...> >>To: "Chinh Ho" <ho_...@ho...> >>CC: <vtd...@li...> >>Subject: Re: [Vtd-xml-users] VTD-XML Query >>Date: Mon, 31 Jul 2006 11:52:48 -0700 >> >>Version 1.6, when you turn off namespace support when parsing... >>the max is 1GB when namespace enabled... >> >>also don't forget to CC vtd-xml-user to keep a record >>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>To: <cra...@co...> >>Sent: Monday, July 31, 2006 11:50 AM >>Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >>>Hi Jimmy, >>>You said that the VTD-XML currently support maximum file size of 2GB. >>>What version of the VTD-XML so that I could try to explore the large XML >>>file. >>> >>> >>> >>> >>>>From: "Jimmy Zhang" <cra...@co...> >>>>To: <vtd...@li...> >>>>CC: Din Sush <di...@ya...> >>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>>> >>>>I think VTD-XML should have a couple of distinct advantages for >>>>splitting >>>>XML, performance >>>>probably being the biggest reason... currently VTD-XML's file size >>>>support >>>>is 2GB, and you need >>>>to have enough memory to hold the document in memory... >>>> >>>>I haven't tried other approaches, but they seem like SAX based, and may >>>>be >>>>slower and less flexible >>>>(SAX is forward only), >>>> >>>>Let me know if there are any questions... you are welcome to share your >>>>experience with us >>>> >>>>----- Original Message ----- >>>>From: "Din Sush" <di...@ya...> >>>>To: <vtd...@li...> >>>>Sent: Monday, July 31, 2006 5:23 AM >>>>Subject: [Vtd-xml-users] VTD-XML Query >>>> >>>> >>>> > Here is my requirement >>>> > >>>> > I need to split really big XML files(1 GB plus) into >>>> > smaller sized files. >>>> > I am in the process of evaluating different >>>> > approaches. >>>> > 1. Use Vtd-XML, parse and split. >>>> > 2. Use Perl XML::Twig split function >>>> > 3. Writing my own parser in perl on top of >>>> > XML::Parser, >>>> > which uses expat. >>>> > 4. Use libxml2. >>>> > >>>> > I am not sure if this is the right place to post this >>>> > question, but would like to know the best approach to >>>> > get the job done effectively. >>>> > >>>> > I would like to know the pros/cons and limitations of >>>> > my proposed solutions. >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > __________________________________________________ >>>> > Do You Yahoo!? >>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>> > http://mail.yahoo.com >>>> > >>>> > >>>>------------------------------------------------------------------------- >>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>>share >>>> > your >>>> > opinions on IT & business topics through brief surveys -- and earn >>>>cash >>>> > >>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>> > _______________________________________________ >>>> > Vtd-xml-users mailing list >>>> > Vtd...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>> > >>>> >>>> >>>> >>>>------------------------------------------------------------------------- >>>>Take Surveys. Earn Cash. Influence the Future of IT >>>>Join SourceForge.net's Techsay panel and you'll get the chance to share >>>>your >>>>opinions on IT & business topics through brief surveys -- and earn cash >>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>_______________________________________________ >>>>Vtd-xml-users mailing list >>>>Vtd...@li... >>>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>> >>> >>> >> >> > > > |