From: Jimmy Z. <cra...@co...> - 2006-07-31 18:53:39
|
Version 1.6, when you turn off namespace support when parsing... the max is 1GB when namespace enabled... also don't forget to CC vtd-xml-user to keep a record ----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> To: <cra...@co...> Sent: Monday, July 31, 2006 11:50 AM Subject: Re: [Vtd-xml-users] VTD-XML Query > Hi Jimmy, > You said that the VTD-XML currently support maximum file size of 2GB. > What version of the VTD-XML so that I could try to explore the large XML > file. > > > > >>From: "Jimmy Zhang" <cra...@co...> >>To: <vtd...@li...> >>CC: Din Sush <di...@ya...> >>Subject: Re: [Vtd-xml-users] VTD-XML Query >>Date: Mon, 31 Jul 2006 08:59:35 -0700 >> >>I think VTD-XML should have a couple of distinct advantages for splitting >>XML, performance >>probably being the biggest reason... currently VTD-XML's file size support >>is 2GB, and you need >>to have enough memory to hold the document in memory... >> >>I haven't tried other approaches, but they seem like SAX based, and may be >>slower and less flexible >>(SAX is forward only), >> >>Let me know if there are any questions... you are welcome to share your >>experience with us >> >>----- Original Message ----- >>From: "Din Sush" <di...@ya...> >>To: <vtd...@li...> >>Sent: Monday, July 31, 2006 5:23 AM >>Subject: [Vtd-xml-users] VTD-XML Query >> >> >> > Here is my requirement >> > >> > I need to split really big XML files(1 GB plus) into >> > smaller sized files. >> > I am in the process of evaluating different >> > approaches. >> > 1. Use Vtd-XML, parse and split. >> > 2. Use Perl XML::Twig split function >> > 3. Writing my own parser in perl on top of >> > XML::Parser, >> > which uses expat. >> > 4. Use libxml2. >> > >> > I am not sure if this is the right place to post this >> > question, but would like to know the best approach to >> > get the job done effectively. >> > >> > I would like to know the pros/cons and limitations of >> > my proposed solutions. >> > >> > >> > >> > >> > >> > __________________________________________________ >> > Do You Yahoo!? >> > Tired of spam? Yahoo! Mail has the best spam protection around >> > http://mail.yahoo.com >> > >> > >>------------------------------------------------------------------------- >> > Take Surveys. Earn Cash. Influence the Future of IT >> > Join SourceForge.net's Techsay panel and you'll get the chance to share >> > your >> > opinions on IT & business topics through brief surveys -- and earn cash >> > >>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> > _______________________________________________ >> > Vtd-xml-users mailing list >> > Vtd...@li... >> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >> > >> >> >> >>------------------------------------------------------------------------- >>Take Surveys. Earn Cash. Influence the Future of IT >>Join SourceForge.net's Techsay panel and you'll get the chance to share >>your >>opinions on IT & business topics through brief surveys -- and earn cash >>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>_______________________________________________ >>Vtd-xml-users mailing list >>Vtd...@li... >>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > > > |
From: Jimmy Z. <cra...@co...> - 2006-07-31 21:41:47
|
----- Original Message -----=20 From: "Tatu Saloranta" <cow...@ya...> To: <vtd...@li...> Sent: Monday, July 31, 2006 1:15 PM Subject: Re: [Vtd-xml-users] VTD-XML Query > --- Jimmy Zhang <cra...@co...> wrote: > >> Well, the problem with streaming approach is that >> you will need to parse >> then reserialize, >> both CPU intensive, with VTD-XML it becomes a lot >> more efficient, but you > > For small files, perhaps, but I would expect that > splitting a _1 gig file_ will be much much slower with > VTD-XML. Why? Because of the huge memory allocations, > and non-locality of the content. Having to read it all > in memory first, and then traversing it again second > time will not be as efficient as doing it in chunks > like streaming parsers do. For big file, it actually works better, because memory allocations=20 performance is a relative term, would you allocate large blocks (VTD-XML) or many=20 discrete objects (like SAX or DOM)? so the bigger the file, the more superior=20 VTD-XML's memory allocation strategy will become... this is in fact what we have seen with our benchmark results with large=20 files... VTD-XML is just like DOM except faster and leaner... > > ... >> other hand, using steaming API like SAX or PULL, you >> will need to read in >> the document >> piecewise anyway, so overall I think VTD-XML should > > Sure. But reading (and parsing) piece by piece, not as > a huge memory consuming chunk, will actually be faster > due to caching issues. Cache doesn't help in general, because there will be data chunks that=20 overflow the cache, so fetching data from main memory is the bottomline=20 performance... but the bottom line is that because VTD-XML's memory utilization is = higher=20 than SAX the chance of cache miss with VTD-XML is considerably lower... > > Maybe I should write a simple test case to demonstrate > that. I could start with simple tests I have for just > parsing, and accessing all information needed. > > Now, simple token indexing that VTD-XML seems to be up > to twice as fast as that of SAX parsers, at least for > small to medium-sized files. That is, assuming no data > is used for anything. Accessing data, for example > reconstructing another tree model, seems to get speeds > down to about equivalent level on my basic tests. > The VTD-XML's basic concept is to avoid tree construction, so building a DOM tree pretty much defeat the purpose of VTD-XML in the first place using data is a little tricky, in FAQ there is a paragraph that does = some=20 explaining... No, that is actually no much of a problem. For the following reasons: 1.. A lot of metadata, i.e. tags and attribute names, are mostly used = for=20 navigation purposes, so they don't need to be converted into string = objects 2.. VTD-XML converts VTD records to primitive data types without=20 converting them into strings. 3.. In DOM the biggest overhead is creating node objects, which = VTD-XML=20 completely avoids. 4.. Even one has to extract data into strings, if he knows beforehand = the=20 length of the string, the string allocation is in fact quite fast. If = string=20 length is not known, the string buffer implementation potentially makes = a=20 lot of copy and discard if the string length exceeds allocated buffer=20 length, which can be inefficient. A VTD record encodes the token length, = which improve string allocation performance. > -+ Tatu +- > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > = -------------------------------------------------------------------------= > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to = share=20 > your > opinions on IT & business topics through brief surveys -- and earn = cash > = http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >=20 |
From: Tatu S. <cow...@ya...> - 2006-07-31 23:33:51
|
.. > > For small files, perhaps, but I would expect that > > splitting a _1 gig file_ will be much much slower > > For big file, it actually works better, because > memory allocations performance > is a relative term, would you allocate large blocks > (VTD-XML) or many > discrete > objects (like SAX or DOM)? so the bigger the file, SAX does not allocate many small objects. Let's forget about DOM for a while -- it has no relevance to this particular case. Allocating a contiguous (virtual) chunk of one gigabyte may not be slow (if one can be found), but using it will be. This is very basic property of RAM access and caching behavior. > the more superior > VTD-XML's > memory allocation strategy will become... > this is in fact what we have seen with our benchmark > results with large > files... ... > Cache doesn't help in general, because there will be > data chunks that > overflow > the cache, so fetching data from main memory is the > bottomline > performance... No, not for SAX or StAX. They operate on their own buffers, and there cache locality does help with decoding, tokenization, parsing, and copying (if Strings need to be built). With VTD-XML, scanning through contents of the file, in memory, will be somewhat sub-optimal, although not horrible (at least it's linear and not random access, as may be the case with DOM trees) > but the bottom line is that because VTD-XML's memory > utilization is higher > than SAX > the chance of cache miss with VTD-XML is > considerably lower... Huh? No it isn't. ... > The VTD-XML's basic concept is to avoid tree > construction, so building > a DOM tree pretty much defeat the purpose of VTD-XML > in the first place Sure. But for purposes of splitting a file, you do need to access all the data: so to test how performance of this particular task behaves, one needs to access all the data. So basic algorithm that fetches data that would be needed to build the tree (without building it) would be the simplest way (short of writing the task itself) to test it. > > using data is a little tricky, in FAQ there is a > paragraph that does some > explaining... > > No, that is actually no much of a problem. For the > following reasons: > > 1.. A lot of metadata, i.e. tags and attribute > names, are mostly used for > navigation purposes, so they don't need to be > converted into string objects Which is what effectively SAX and StAX parsers do too. They just use symbol tables, reusing these names. And the benefit is that when you do actually need them, they will be available, without having to be constructed each and every time as VTD-XML does. > 2.. VTD-XML converts VTD records to primitive data > types without > converting them into strings. Which SAX also doesn't have to do, unless data is actually needed as Strings. > 3.. In DOM the biggest overhead is creating node > objects, which VTD-XML > completely avoids. Yes... at least, until you need the data. But if you do, modifying the tree is easier with DOM, albeit slower. Trying to modify VTD-XML tree gets progressively harder. > 4.. Even one has to extract data into strings, if > he knows beforehand the > length of the string, the string allocation is in > fact quite fast. If string > length is not known, the string buffer > implementation potentially makes a > lot of copy and discard if the string length exceeds > allocated buffer > length, which can be inefficient. A VTD record > encodes the token length, > which improve string allocation performance. I assume you haven't ever/lately looked inside code of modern SAX or StAX parsers? They do not blindly construct StringBuffers or StringBuilders without knowledge either. And unlike VTD-XML, they do not need to do decoding twice, just once, if data is actually needed. Now, don't get me wrong -- there are benefits to quick-and-dirty int-index based parsers. I wrote something rather similar 4 years ago, for internal use at Sun, for purposes of html page scraping and replacement. It did in fact work fast, about 2x speed of Xerces then. But it's optimal use area is quite limited; and it is quite hard to make such approach both fast and xml compliant. Handling of entities is hard; dealing correctly with namespaces is complicated as well. -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Jimmy Z. <cra...@co...> - 2006-07-31 21:42:31
|
what problem do you see when compiling the C version, I was able to = compile without much problem. I thnk *if* you have an large amount of memory available, malloc will be = allocate a large chunk...afterall, it is what malloc does.... ----- Original Message -----=20 From: "Chinh Ho" <ho_...@ho...> To: <cra...@co...> Cc: <vtd...@li...> Sent: Monday, July 31, 2006 1:57 PM Subject: Re: [Vtd-xml-users] VTD-XML Query > Sorry for a dummy question. > I've been in C for 3 years. Could you tell me how to compile it, I = mean=20 > the C version? > > In C_light version, I've been able to compile and run it. However, I=20 > could not compile the C version under MSVC++ 6.0 nor MS2005. In = C_light=20 > version, I saw that the code in benchmark_tdxml.c: > xml =3D (UByte *)malloc(sizeof(UByte)*(int)s.st_size); > k=3D(int)fread(xml,sizeof(UByte),s.st_size,f); > I believe that the malloc function takes an input parameter as an = integer.=20 > When we have 2.0 gig file, how the malloc handle 2,000,000,000 (2 = Gig).=20 > The struct s, particularly st_size is collapsed when it has this = number.=20 > On the debug windows, it shows -... (negative number). It means that = this=20 > function has been overloaded. > > I don't know about the C# version. I am going to give it a try. > > > > >>From: "Jimmy Zhang" <cra...@co...> >>To: "Chinh Ho" <ho_...@ho...> >>CC: <vtd...@li...> >>Subject: Re: [Vtd-xml-users] VTD-XML Query >>Date: Mon, 31 Jul 2006 11:52:48 -0700 >> >>Version 1.6, when you turn off namespace support when parsing... >>the max is 1GB when namespace enabled... >> >>also don't forget to CC vtd-xml-user to keep a record >>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>To: <cra...@co...> >>Sent: Monday, July 31, 2006 11:50 AM >>Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >>>Hi Jimmy, >>>You said that the VTD-XML currently support maximum file size of 2GB. = >>>What version of the VTD-XML so that I could try to explore the large = XML=20 >>>file. >>> >>> >>> >>> >>>>From: "Jimmy Zhang" <cra...@co...> >>>>To: <vtd...@li...> >>>>CC: Din Sush <di...@ya...> >>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>>> >>>>I think VTD-XML should have a couple of distinct advantages for=20 >>>>splitting >>>>XML, performance >>>>probably being the biggest reason... currently VTD-XML's file size=20 >>>>support >>>>is 2GB, and you need >>>>to have enough memory to hold the document in memory... >>>> >>>>I haven't tried other approaches, but they seem like SAX based, and = may=20 >>>>be >>>>slower and less flexible >>>>(SAX is forward only), >>>> >>>>Let me know if there are any questions... you are welcome to share = your >>>>experience with us >>>> >>>>----- Original Message ----- >>>>From: "Din Sush" <di...@ya...> >>>>To: <vtd...@li...> >>>>Sent: Monday, July 31, 2006 5:23 AM >>>>Subject: [Vtd-xml-users] VTD-XML Query >>>> >>>> >>>> > Here is my requirement >>>> > >>>> > I need to split really big XML files(1 GB plus) into >>>> > smaller sized files. >>>> > I am in the process of evaluating different >>>> > approaches. >>>> > 1. Use Vtd-XML, parse and split. >>>> > 2. Use Perl XML::Twig split function >>>> > 3. Writing my own parser in perl on top of >>>> > XML::Parser, >>>> > which uses expat. >>>> > 4. Use libxml2. >>>> > >>>> > I am not sure if this is the right place to post this >>>> > question, but would like to know the best approach to >>>> > get the job done effectively. >>>> > >>>> > I would like to know the pros/cons and limitations of >>>> > my proposed solutions. >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > __________________________________________________ >>>> > Do You Yahoo!? >>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>> > http://mail.yahoo.com >>>> > >>>> > >>>>---------------------------------------------------------------------= ---- >>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>>share >>>> > your >>>> > opinions on IT & business topics through brief surveys -- and = earn >>>>cash >>>> > >>>>http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CI= D=3DDEVDEV >>>> > _______________________________________________ >>>> > Vtd-xml-users mailing list >>>> > Vtd...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>> > >>>> >>>> >>>> >>>>---------------------------------------------------------------------= ---- >>>>Take Surveys. Earn Cash. Influence the Future of IT >>>>Join SourceForge.net's Techsay panel and you'll get the chance to = share=20 >>>>your >>>>opinions on IT & business topics through brief surveys -- and earn = cash >>>>http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CI= D=3DDEVDEV >>>>_______________________________________________ >>>>Vtd-xml-users mailing list >>>>Vtd...@li... >>>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>> >>> >>> >> >> > > >=20 Jimmy Zhang (408) 835-2267 XimpleWare http://www.ximpleware.com |
From: Chinh H. <ho_...@ho...> - 2006-07-31 20:58:30
|
Sorry for a dummy question. I've been in C for 3 years. Could you tell me how to compile it, I mean the C version? In C_light version, I've been able to compile and run it. However, I could not compile the C version under MSVC++ 6.0 nor MS2005. In C_light version, I saw that the code in benchmark_tdxml.c: xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); k=(int)fread(xml,sizeof(UByte),s.st_size,f); I believe that the malloc function takes an input parameter as an integer. When we have 2.0 gig file, how the malloc handle 2,000,000,000 (2 Gig). The struct s, particularly st_size is collapsed when it has this number. On the debug windows, it shows -... (negative number). It means that this function has been overloaded. I don't know about the C# version. I am going to give it a try. >From: "Jimmy Zhang" <cra...@co...> >To: "Chinh Ho" <ho_...@ho...> >CC: <vtd...@li...> >Subject: Re: [Vtd-xml-users] VTD-XML Query >Date: Mon, 31 Jul 2006 11:52:48 -0700 > >Version 1.6, when you turn off namespace support when parsing... >the max is 1GB when namespace enabled... > >also don't forget to CC vtd-xml-user to keep a record >----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >To: <cra...@co...> >Sent: Monday, July 31, 2006 11:50 AM >Subject: Re: [Vtd-xml-users] VTD-XML Query > > >>Hi Jimmy, >>You said that the VTD-XML currently support maximum file size of 2GB. What >>version of the VTD-XML so that I could try to explore the large XML file. >> >> >> >> >>>From: "Jimmy Zhang" <cra...@co...> >>>To: <vtd...@li...> >>>CC: Din Sush <di...@ya...> >>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>> >>>I think VTD-XML should have a couple of distinct advantages for splitting >>>XML, performance >>>probably being the biggest reason... currently VTD-XML's file size >>>support >>>is 2GB, and you need >>>to have enough memory to hold the document in memory... >>> >>>I haven't tried other approaches, but they seem like SAX based, and may >>>be >>>slower and less flexible >>>(SAX is forward only), >>> >>>Let me know if there are any questions... you are welcome to share your >>>experience with us >>> >>>----- Original Message ----- >>>From: "Din Sush" <di...@ya...> >>>To: <vtd...@li...> >>>Sent: Monday, July 31, 2006 5:23 AM >>>Subject: [Vtd-xml-users] VTD-XML Query >>> >>> >>> > Here is my requirement >>> > >>> > I need to split really big XML files(1 GB plus) into >>> > smaller sized files. >>> > I am in the process of evaluating different >>> > approaches. >>> > 1. Use Vtd-XML, parse and split. >>> > 2. Use Perl XML::Twig split function >>> > 3. Writing my own parser in perl on top of >>> > XML::Parser, >>> > which uses expat. >>> > 4. Use libxml2. >>> > >>> > I am not sure if this is the right place to post this >>> > question, but would like to know the best approach to >>> > get the job done effectively. >>> > >>> > I would like to know the pros/cons and limitations of >>> > my proposed solutions. >>> > >>> > >>> > >>> > >>> > >>> > __________________________________________________ >>> > Do You Yahoo!? >>> > Tired of spam? Yahoo! Mail has the best spam protection around >>> > http://mail.yahoo.com >>> > >>> > >>>------------------------------------------------------------------------- >>> > Take Surveys. Earn Cash. Influence the Future of IT >>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>share >>> > your >>> > opinions on IT & business topics through brief surveys -- and earn >>>cash >>> > >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>> > _______________________________________________ >>> > Vtd-xml-users mailing list >>> > Vtd...@li... >>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>> > >>> >>> >>> >>>------------------------------------------------------------------------- >>>Take Surveys. Earn Cash. Influence the Future of IT >>>Join SourceForge.net's Techsay panel and you'll get the chance to share >>>your >>>opinions on IT & business topics through brief surveys -- and earn cash >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>_______________________________________________ >>>Vtd-xml-users mailing list >>>Vtd...@li... >>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >> >> >> > > |
From: Tatu S. <cow...@ya...> - 2006-07-31 21:40:22
|
--- Chinh Ho <ho_...@ho...> wrote: ... > I don't know about the C# version. I am going to > give it a try. In addition to possible c# version of VTD-XML, Microsoft's .NET has its own basic streaming xml stream (implementation of XmlReader) that might be of use, I think. I'm not a c# guy, but recall seeing that mentioned. -+ Tatu +- > > > > > >From: "Jimmy Zhang" <cra...@co...> > >To: "Chinh Ho" <ho_...@ho...> > >CC: <vtd...@li...> > >Subject: Re: [Vtd-xml-users] VTD-XML Query > >Date: Mon, 31 Jul 2006 11:52:48 -0700 > > > >Version 1.6, when you turn off namespace support > when parsing... > >the max is 1GB when namespace enabled... > > > >also don't forget to CC vtd-xml-user to keep a > record > >----- Original Message ----- From: "Chinh Ho" > <ho_...@ho...> > >To: <cra...@co...> > >Sent: Monday, July 31, 2006 11:50 AM > >Subject: Re: [Vtd-xml-users] VTD-XML Query > > > > > >>Hi Jimmy, > >>You said that the VTD-XML currently support > maximum file size of 2GB. What > >>version of the VTD-XML so that I could try to > explore the large XML file. > >> > >> > >> > >> > >>>From: "Jimmy Zhang" <cra...@co...> > >>>To: <vtd...@li...> > >>>CC: Din Sush <di...@ya...> > >>>Subject: Re: [Vtd-xml-users] VTD-XML Query > >>>Date: Mon, 31 Jul 2006 08:59:35 -0700 > >>> > >>>I think VTD-XML should have a couple of distinct > advantages for splitting > >>>XML, performance > >>>probably being the biggest reason... currently > VTD-XML's file size > >>>support > >>>is 2GB, and you need > >>>to have enough memory to hold the document in > memory... > >>> > >>>I haven't tried other approaches, but they seem > like SAX based, and may > >>>be > >>>slower and less flexible > >>>(SAX is forward only), > >>> > >>>Let me know if there are any questions... you are > welcome to share your > >>>experience with us > >>> > >>>----- Original Message ----- > >>>From: "Din Sush" <di...@ya...> > >>>To: <vtd...@li...> > >>>Sent: Monday, July 31, 2006 5:23 AM > >>>Subject: [Vtd-xml-users] VTD-XML Query > >>> > >>> > >>> > Here is my requirement > >>> > > >>> > I need to split really big XML files(1 GB > plus) into > >>> > smaller sized files. > >>> > I am in the process of evaluating different > >>> > approaches. > >>> > 1. Use Vtd-XML, parse and split. > >>> > 2. Use Perl XML::Twig split function > >>> > 3. Writing my own parser in perl on top of > >>> > XML::Parser, > >>> > which uses expat. > >>> > 4. Use libxml2. > >>> > > >>> > I am not sure if this is the right place to > post this > >>> > question, but would like to know the best > approach to > >>> > get the job done effectively. > >>> > > >>> > I would like to know the pros/cons and > limitations of > >>> > my proposed solutions. > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > __________________________________________________ > >>> > Do You Yahoo!? > >>> > Tired of spam? Yahoo! Mail has the best spam > protection around > >>> > http://mail.yahoo.com > >>> > > >>> > > >>>------------------------------------------------------------------------- > >>> > Take Surveys. Earn Cash. Influence the Future > of IT > >>> > Join SourceForge.net's Techsay panel and > you'll get the chance to > >>>share > >>> > your > >>> > opinions on IT & business topics through brief > surveys -- and earn > >>>cash > >>> > > >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > >>> > > _______________________________________________ > >>> > Vtd-xml-users mailing list > >>> > Vtd...@li... > >>> > > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > >>> > > >>> > >>> > >>> > >>>------------------------------------------------------------------------- > >>>Take Surveys. Earn Cash. Influence the Future of > IT > >>>Join SourceForge.net's Techsay panel and you'll > get the chance to share > >>>your > >>>opinions on IT & business topics through brief > surveys -- and earn cash > >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > >>>_______________________________________________ > >>>Vtd-xml-users mailing list > >>>Vtd...@li... > >>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > >> > >> > >> > > > > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get > the chance to share your > opinions on IT & business topics through brief > surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Chinh H. <ho_...@ho...> - 2006-08-02 17:20:04
|
I used version 1.6. Here are the steps that I did: 1/ Download the ximpleware_1.6_c_light and extract them out to a folder named "ximpleware". 2/ Open MS Visual Studio 2005. 3/ Open new empty C++ Win32 Console Application project. 4/ Open the "ximpleware" folder. Select all files. Drag and drop them to the Solution Explorer window in MS 2005. 5/ Click build solution. 6/ Copy the 2 Gig xml file to the debug folder. 7/ Open the benchmark_vtdxml.c . Comment out the int argc and char *argv[] in main() 8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml file name. 9/ Replace the argv[1] in the next line with the same xml file name: (stat(argv[1], &s)) 10/ Put breakpoints at the line f = fopen("foo.xml", "r"); and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); 11/ Press F5. 12/ On the Autos window, it shows the s.st_size is -858993460. On the cmd window, it shows the same size of the file : "size of the file is -858993460" 13/ Press F10 twice. 14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion Failed! Program:... File: fread.c Line: 93 Expression: (buffer != NULL) Please let me know what step(s) that I did wrong. Also, how do you turn off the namespace support when parsing. I could not use the ximpleware_1.6_c because of the .l and .y files. I think these files are for the Unix version, aren't they? >From: "Jimmy Zhang" <cra...@co...> >To: "Chinh Ho" <ho_...@ho...> >CC: <vtd...@li...> >Subject: Re: [Vtd-xml-users] VTD-XML Query >Date: Mon, 31 Jul 2006 11:52:48 -0700 > >Version 1.6, when you turn off namespace support when parsing... >the max is 1GB when namespace enabled... > >also don't forget to CC vtd-xml-user to keep a record >----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >To: <cra...@co...> >Sent: Monday, July 31, 2006 11:50 AM >Subject: Re: [Vtd-xml-users] VTD-XML Query > > >>Hi Jimmy, >>You said that the VTD-XML currently support maximum file size of 2GB. What >>version of the VTD-XML so that I could try to explore the large XML file. >> >> >> >> >>>From: "Jimmy Zhang" <cra...@co...> >>>To: <vtd...@li...> >>>CC: Din Sush <di...@ya...> >>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>> >>>I think VTD-XML should have a couple of distinct advantages for splitting >>>XML, performance >>>probably being the biggest reason... currently VTD-XML's file size >>>support >>>is 2GB, and you need >>>to have enough memory to hold the document in memory... >>> >>>I haven't tried other approaches, but they seem like SAX based, and may >>>be >>>slower and less flexible >>>(SAX is forward only), >>> >>>Let me know if there are any questions... you are welcome to share your >>>experience with us >>> >>>----- Original Message ----- >>>From: "Din Sush" <di...@ya...> >>>To: <vtd...@li...> >>>Sent: Monday, July 31, 2006 5:23 AM >>>Subject: [Vtd-xml-users] VTD-XML Query >>> >>> >>> > Here is my requirement >>> > >>> > I need to split really big XML files(1 GB plus) into >>> > smaller sized files. >>> > I am in the process of evaluating different >>> > approaches. >>> > 1. Use Vtd-XML, parse and split. >>> > 2. Use Perl XML::Twig split function >>> > 3. Writing my own parser in perl on top of >>> > XML::Parser, >>> > which uses expat. >>> > 4. Use libxml2. >>> > >>> > I am not sure if this is the right place to post this >>> > question, but would like to know the best approach to >>> > get the job done effectively. >>> > >>> > I would like to know the pros/cons and limitations of >>> > my proposed solutions. >>> > >>> > >>> > >>> > >>> > >>> > __________________________________________________ >>> > Do You Yahoo!? >>> > Tired of spam? Yahoo! Mail has the best spam protection around >>> > http://mail.yahoo.com >>> > >>> > >>>------------------------------------------------------------------------- >>> > Take Surveys. Earn Cash. Influence the Future of IT >>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>share >>> > your >>> > opinions on IT & business topics through brief surveys -- and earn >>>cash >>> > >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>> > _______________________________________________ >>> > Vtd-xml-users mailing list >>> > Vtd...@li... >>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>> > >>> >>> >>> >>>------------------------------------------------------------------------- >>>Take Surveys. Earn Cash. Influence the Future of IT >>>Join SourceForge.net's Techsay panel and you'll get the chance to share >>>your >>>opinions on IT & business topics through brief surveys -- and earn cash >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>_______________________________________________ >>>Vtd-xml-users mailing list >>>Vtd...@li... >>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >> >> >> > > |
From: Jimmy Z. <cra...@co...> - 2006-08-02 19:23:14
|
Can you first try to parse a smaller document like 20MB to see it works ok or not? I suspect that the file size is getting too big so that it overflows the 32-bit integers, causing it to intepret is a negative value... ----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> To: <cra...@co...> Cc: <vtd...@li...> Sent: Wednesday, August 02, 2006 10:19 AM Subject: Re: [Vtd-xml-users] VTD-XML Query >I used version 1.6. Here are the steps that I did: > 1/ Download the ximpleware_1.6_c_light and extract them out to a folder > named "ximpleware". > 2/ Open MS Visual Studio 2005. > 3/ Open new empty C++ Win32 Console Application project. > 4/ Open the "ximpleware" folder. Select all files. Drag and drop them to > the Solution Explorer window in MS 2005. > 5/ Click build solution. > 6/ Copy the 2 Gig xml file to the debug folder. > 7/ Open the benchmark_vtdxml.c . Comment out the int argc and char > *argv[] in main() > 8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml > file name. > 9/ Replace the argv[1] in the next line with the same xml file name: > (stat(argv[1], &s)) > 10/ Put breakpoints at the line f = fopen("foo.xml", "r"); > and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); > 11/ Press F5. > 12/ On the Autos window, it shows the s.st_size is -858993460. > On the cmd window, it shows the same size of the file : "size of the > file is -858993460" > 13/ Press F10 twice. > 14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion > Failed! Program:... File: fread.c Line: 93 Expression: (buffer > != NULL) > > Please let me know what step(s) that I did wrong. Also, how do you turn > off the namespace support when parsing. > I could not use the ximpleware_1.6_c because of the .l and .y files. I > think these files are for the Unix version, aren't they? > > > > > >>From: "Jimmy Zhang" <cra...@co...> >>To: "Chinh Ho" <ho_...@ho...> >>CC: <vtd...@li...> >>Subject: Re: [Vtd-xml-users] VTD-XML Query >>Date: Mon, 31 Jul 2006 11:52:48 -0700 >> >>Version 1.6, when you turn off namespace support when parsing... >>the max is 1GB when namespace enabled... >> >>also don't forget to CC vtd-xml-user to keep a record >>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>To: <cra...@co...> >>Sent: Monday, July 31, 2006 11:50 AM >>Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >>>Hi Jimmy, >>>You said that the VTD-XML currently support maximum file size of 2GB. >>>What version of the VTD-XML so that I could try to explore the large XML >>>file. >>> >>> >>> >>> >>>>From: "Jimmy Zhang" <cra...@co...> >>>>To: <vtd...@li...> >>>>CC: Din Sush <di...@ya...> >>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>>> >>>>I think VTD-XML should have a couple of distinct advantages for >>>>splitting >>>>XML, performance >>>>probably being the biggest reason... currently VTD-XML's file size >>>>support >>>>is 2GB, and you need >>>>to have enough memory to hold the document in memory... >>>> >>>>I haven't tried other approaches, but they seem like SAX based, and may >>>>be >>>>slower and less flexible >>>>(SAX is forward only), >>>> >>>>Let me know if there are any questions... you are welcome to share your >>>>experience with us >>>> >>>>----- Original Message ----- >>>>From: "Din Sush" <di...@ya...> >>>>To: <vtd...@li...> >>>>Sent: Monday, July 31, 2006 5:23 AM >>>>Subject: [Vtd-xml-users] VTD-XML Query >>>> >>>> >>>> > Here is my requirement >>>> > >>>> > I need to split really big XML files(1 GB plus) into >>>> > smaller sized files. >>>> > I am in the process of evaluating different >>>> > approaches. >>>> > 1. Use Vtd-XML, parse and split. >>>> > 2. Use Perl XML::Twig split function >>>> > 3. Writing my own parser in perl on top of >>>> > XML::Parser, >>>> > which uses expat. >>>> > 4. Use libxml2. >>>> > >>>> > I am not sure if this is the right place to post this >>>> > question, but would like to know the best approach to >>>> > get the job done effectively. >>>> > >>>> > I would like to know the pros/cons and limitations of >>>> > my proposed solutions. >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > __________________________________________________ >>>> > Do You Yahoo!? >>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>> > http://mail.yahoo.com >>>> > >>>> > >>>>------------------------------------------------------------------------- >>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>>share >>>> > your >>>> > opinions on IT & business topics through brief surveys -- and earn >>>>cash >>>> > >>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>> > _______________________________________________ >>>> > Vtd-xml-users mailing list >>>> > Vtd...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>> > >>>> >>>> >>>> >>>>------------------------------------------------------------------------- >>>>Take Surveys. Earn Cash. Influence the Future of IT >>>>Join SourceForge.net's Techsay panel and you'll get the chance to share >>>>your >>>>opinions on IT & business topics through brief surveys -- and earn cash >>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>_______________________________________________ >>>>Vtd-xml-users mailing list >>>>Vtd...@li... >>>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>> >>> >>> >> >> > > > |
From: Chinh H. <ho_...@ho...> - 2006-08-02 20:08:23
|
Well, you might read my question below: >>> >>>>Hi Jimmy, >>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>What version of the VTD-XML so that I could try to explore the large XML >>>>file. >>>> I entered a bug on sourceforge website about this. However, I did not stated very clear. The malloc function in C takes an integer. The integer is 4 bits (8 bytes). It's about 8 digits number or 99999999. 2 Gig = 2000000000 or ten digits number. It creates the overflow when it tries to allocate the memory. I think the 20 MB will work OK. >From: "Jimmy Zhang" <cra...@co...> >To: "Chinh Ho" <ho_...@ho...>,<vtd...@li...> >Subject: Re: [Vtd-xml-users] VTD-XML Query >Date: Wed, 2 Aug 2006 12:22:57 -0700 > >Can you first try to parse a smaller document like 20MB to see it works ok >or not? > >I suspect that the file size is getting too big so that it overflows the >32-bit integers, >causing it to intepret is a negative value... >----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >To: <cra...@co...> >Cc: <vtd...@li...> >Sent: Wednesday, August 02, 2006 10:19 AM >Subject: Re: [Vtd-xml-users] VTD-XML Query > > >>I used version 1.6. Here are the steps that I did: >>1/ Download the ximpleware_1.6_c_light and extract them out to a folder >>named "ximpleware". >>2/ Open MS Visual Studio 2005. >>3/ Open new empty C++ Win32 Console Application project. >>4/ Open the "ximpleware" folder. Select all files. Drag and drop them to >>the Solution Explorer window in MS 2005. >>5/ Click build solution. >>6/ Copy the 2 Gig xml file to the debug folder. >>7/ Open the benchmark_vtdxml.c . Comment out the int argc and char >>*argv[] in main() >>8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml >>file name. >>9/ Replace the argv[1] in the next line with the same xml file name: >>(stat(argv[1], &s)) >>10/ Put breakpoints at the line f = fopen("foo.xml", "r"); >> and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); >>11/ Press F5. >>12/ On the Autos window, it shows the s.st_size is -858993460. >> On the cmd window, it shows the same size of the file : "size of the >>file is -858993460" >>13/ Press F10 twice. >>14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion >>Failed! Program:... File: fread.c Line: 93 Expression: (buffer >>!= NULL) >> >>Please let me know what step(s) that I did wrong. Also, how do you turn >>off the namespace support when parsing. >>I could not use the ximpleware_1.6_c because of the .l and .y files. I >>think these files are for the Unix version, aren't they? >> >> >> >> >> >>>From: "Jimmy Zhang" <cra...@co...> >>>To: "Chinh Ho" <ho_...@ho...> >>>CC: <vtd...@li...> >>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>Date: Mon, 31 Jul 2006 11:52:48 -0700 >>> >>>Version 1.6, when you turn off namespace support when parsing... >>>the max is 1GB when namespace enabled... >>> >>>also don't forget to CC vtd-xml-user to keep a record >>>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>>To: <cra...@co...> >>>Sent: Monday, July 31, 2006 11:50 AM >>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>> >>> >>>>Hi Jimmy, >>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>What version of the VTD-XML so that I could try to explore the large XML >>>>file. >>>> >>>> >>>> >>>> >>>>>From: "Jimmy Zhang" <cra...@co...> >>>>>To: <vtd...@li...> >>>>>CC: Din Sush <di...@ya...> >>>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>>>> >>>>>I think VTD-XML should have a couple of distinct advantages for >>>>>splitting >>>>>XML, performance >>>>>probably being the biggest reason... currently VTD-XML's file size >>>>>support >>>>>is 2GB, and you need >>>>>to have enough memory to hold the document in memory... >>>>> >>>>>I haven't tried other approaches, but they seem like SAX based, and may >>>>>be >>>>>slower and less flexible >>>>>(SAX is forward only), >>>>> >>>>>Let me know if there are any questions... you are welcome to share your >>>>>experience with us >>>>> >>>>>----- Original Message ----- >>>>>From: "Din Sush" <di...@ya...> >>>>>To: <vtd...@li...> >>>>>Sent: Monday, July 31, 2006 5:23 AM >>>>>Subject: [Vtd-xml-users] VTD-XML Query >>>>> >>>>> >>>>> > Here is my requirement >>>>> > >>>>> > I need to split really big XML files(1 GB plus) into >>>>> > smaller sized files. >>>>> > I am in the process of evaluating different >>>>> > approaches. >>>>> > 1. Use Vtd-XML, parse and split. >>>>> > 2. Use Perl XML::Twig split function >>>>> > 3. Writing my own parser in perl on top of >>>>> > XML::Parser, >>>>> > which uses expat. >>>>> > 4. Use libxml2. >>>>> > >>>>> > I am not sure if this is the right place to post this >>>>> > question, but would like to know the best approach to >>>>> > get the job done effectively. >>>>> > >>>>> > I would like to know the pros/cons and limitations of >>>>> > my proposed solutions. >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > __________________________________________________ >>>>> > Do You Yahoo!? >>>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>>> > http://mail.yahoo.com >>>>> > >>>>> > >>>>>------------------------------------------------------------------------- >>>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>>>share >>>>> > your >>>>> > opinions on IT & business topics through brief surveys -- and earn >>>>>cash >>>>> > >>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>> > _______________________________________________ >>>>> > Vtd-xml-users mailing list >>>>> > Vtd...@li... >>>>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>>> > >>>>> >>>>> >>>>> >>>>>------------------------------------------------------------------------- >>>>>Take Surveys. Earn Cash. Influence the Future of IT >>>>>Join SourceForge.net's Techsay panel and you'll get the chance to share >>>>>your >>>>>opinions on IT & business topics through brief surveys -- and earn cash >>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>>_______________________________________________ >>>>>Vtd-xml-users mailing list >>>>>Vtd...@li... >>>>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>> >>>> >>>> >>> >>> >> >> >> > > |
From: Jimmy Z. <cra...@co...> - 2006-08-02 21:02:47
|
yeah, I responded the request, can you somehow split The XML file into smaller chunks?? ----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> To: <cra...@co...>; <vtd...@li...> Sent: Wednesday, August 02, 2006 1:08 PM Subject: Re: [Vtd-xml-users] VTD-XML Query > Well, you might read my question below: >>>> >>>>>Hi Jimmy, >>>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>>What version of the VTD-XML so that I could try to explore the large >>>>>XML file. >>>>> > > I entered a bug on sourceforge website about this. However, I did not > stated very clear. The malloc function in C takes an integer. The > integer is 4 bits (8 bytes). It's about 8 digits number or 99999999. > > 2 Gig = 2000000000 or ten digits number. > > It creates the overflow when it tries to allocate the memory. > > I think the 20 MB will work OK. > > > > >>From: "Jimmy Zhang" <cra...@co...> >>To: "Chinh Ho" >><ho_...@ho...>,<vtd...@li...> >>Subject: Re: [Vtd-xml-users] VTD-XML Query >>Date: Wed, 2 Aug 2006 12:22:57 -0700 >> >>Can you first try to parse a smaller document like 20MB to see it works ok >>or not? >> >>I suspect that the file size is getting too big so that it overflows the >>32-bit integers, >>causing it to intepret is a negative value... >>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>To: <cra...@co...> >>Cc: <vtd...@li...> >>Sent: Wednesday, August 02, 2006 10:19 AM >>Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >>>I used version 1.6. Here are the steps that I did: >>>1/ Download the ximpleware_1.6_c_light and extract them out to a folder >>>named "ximpleware". >>>2/ Open MS Visual Studio 2005. >>>3/ Open new empty C++ Win32 Console Application project. >>>4/ Open the "ximpleware" folder. Select all files. Drag and drop them >>>to the Solution Explorer window in MS 2005. >>>5/ Click build solution. >>>6/ Copy the 2 Gig xml file to the debug folder. >>>7/ Open the benchmark_vtdxml.c . Comment out the int argc and char >>>*argv[] in main() >>>8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml >>>file name. >>>9/ Replace the argv[1] in the next line with the same xml file name: >>>(stat(argv[1], &s)) >>>10/ Put breakpoints at the line f = fopen("foo.xml", "r"); >>> and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); >>>11/ Press F5. >>>12/ On the Autos window, it shows the s.st_size is -858993460. >>> On the cmd window, it shows the same size of the file : "size of >>> the file is -858993460" >>>13/ Press F10 twice. >>>14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion >>>Failed! Program:... File: fread.c Line: 93 Expression: >>>(buffer != NULL) >>> >>>Please let me know what step(s) that I did wrong. Also, how do you turn >>>off the namespace support when parsing. >>>I could not use the ximpleware_1.6_c because of the .l and .y files. I >>>think these files are for the Unix version, aren't they? >>> >>> >>> >>> >>> >>>>From: "Jimmy Zhang" <cra...@co...> >>>>To: "Chinh Ho" <ho_...@ho...> >>>>CC: <vtd...@li...> >>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>Date: Mon, 31 Jul 2006 11:52:48 -0700 >>>> >>>>Version 1.6, when you turn off namespace support when parsing... >>>>the max is 1GB when namespace enabled... >>>> >>>>also don't forget to CC vtd-xml-user to keep a record >>>>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>>>To: <cra...@co...> >>>>Sent: Monday, July 31, 2006 11:50 AM >>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>> >>>> >>>>>Hi Jimmy, >>>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>>What version of the VTD-XML so that I could try to explore the large >>>>>XML file. >>>>> >>>>> >>>>> >>>>> >>>>>>From: "Jimmy Zhang" <cra...@co...> >>>>>>To: <vtd...@li...> >>>>>>CC: Din Sush <di...@ya...> >>>>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>>>>> >>>>>>I think VTD-XML should have a couple of distinct advantages for >>>>>>splitting >>>>>>XML, performance >>>>>>probably being the biggest reason... currently VTD-XML's file size >>>>>>support >>>>>>is 2GB, and you need >>>>>>to have enough memory to hold the document in memory... >>>>>> >>>>>>I haven't tried other approaches, but they seem like SAX based, and >>>>>>may be >>>>>>slower and less flexible >>>>>>(SAX is forward only), >>>>>> >>>>>>Let me know if there are any questions... you are welcome to share >>>>>>your >>>>>>experience with us >>>>>> >>>>>>----- Original Message ----- >>>>>>From: "Din Sush" <di...@ya...> >>>>>>To: <vtd...@li...> >>>>>>Sent: Monday, July 31, 2006 5:23 AM >>>>>>Subject: [Vtd-xml-users] VTD-XML Query >>>>>> >>>>>> >>>>>> > Here is my requirement >>>>>> > >>>>>> > I need to split really big XML files(1 GB plus) into >>>>>> > smaller sized files. >>>>>> > I am in the process of evaluating different >>>>>> > approaches. >>>>>> > 1. Use Vtd-XML, parse and split. >>>>>> > 2. Use Perl XML::Twig split function >>>>>> > 3. Writing my own parser in perl on top of >>>>>> > XML::Parser, >>>>>> > which uses expat. >>>>>> > 4. Use libxml2. >>>>>> > >>>>>> > I am not sure if this is the right place to post this >>>>>> > question, but would like to know the best approach to >>>>>> > get the job done effectively. >>>>>> > >>>>>> > I would like to know the pros/cons and limitations of >>>>>> > my proposed solutions. >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > __________________________________________________ >>>>>> > Do You Yahoo!? >>>>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>>>> > http://mail.yahoo.com >>>>>> > >>>>>> > >>>>>>------------------------------------------------------------------------- >>>>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>>>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>>>>share >>>>>> > your >>>>>> > opinions on IT & business topics through brief surveys -- and earn >>>>>>cash >>>>>> > >>>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>>> > _______________________________________________ >>>>>> > Vtd-xml-users mailing list >>>>>> > Vtd...@li... >>>>>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>>------------------------------------------------------------------------- >>>>>>Take Surveys. Earn Cash. Influence the Future of IT >>>>>>Join SourceForge.net's Techsay panel and you'll get the chance to >>>>>>share your >>>>>>opinions on IT & business topics through brief surveys -- and earn >>>>>>cash >>>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>>>_______________________________________________ >>>>>>Vtd-xml-users mailing list >>>>>>Vtd...@li... >>>>>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > > |