You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(6) |
Jul
(21) |
Aug
(40) |
Sep
(7) |
Oct
(41) |
Nov
(52) |
Dec
(19) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(49) |
Feb
(37) |
Mar
(84) |
Apr
(11) |
May
(29) |
Jun
(9) |
Jul
(19) |
Aug
(9) |
Sep
(6) |
Oct
(5) |
Nov
(15) |
Dec
(3) |
2008 |
Jan
(7) |
Feb
(11) |
Mar
(25) |
Apr
(50) |
May
(7) |
Jun
(8) |
Jul
(10) |
Aug
(18) |
Sep
(1) |
Oct
(15) |
Nov
(1) |
Dec
(9) |
2009 |
Jan
(5) |
Feb
(2) |
Mar
(3) |
Apr
(5) |
May
(10) |
Jun
(4) |
Jul
(5) |
Aug
(5) |
Sep
(7) |
Oct
(15) |
Nov
(13) |
Dec
(6) |
2010 |
Jan
|
Feb
(3) |
Mar
(4) |
Apr
(6) |
May
|
Jun
(4) |
Jul
(12) |
Aug
(8) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2011 |
Jan
(19) |
Feb
(39) |
Mar
(28) |
Apr
(6) |
May
(7) |
Jun
(9) |
Jul
|
Aug
(1) |
Sep
|
Oct
(8) |
Nov
(3) |
Dec
(12) |
2012 |
Jan
(2) |
Feb
(1) |
Mar
(3) |
Apr
(4) |
May
(4) |
Jun
(3) |
Jul
(10) |
Aug
(2) |
Sep
(13) |
Oct
(24) |
Nov
(3) |
Dec
(1) |
2013 |
Jan
(11) |
Feb
(5) |
Mar
(4) |
Apr
(3) |
May
(3) |
Jun
(5) |
Jul
(7) |
Aug
(16) |
Sep
|
Oct
(7) |
Nov
(11) |
Dec
|
2014 |
Jan
(7) |
Feb
(4) |
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
(1) |
Sep
(3) |
Oct
|
Nov
(3) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
(11) |
May
(8) |
Jun
(3) |
Jul
(1) |
Aug
(3) |
Sep
(5) |
Oct
(2) |
Nov
(1) |
Dec
(1) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(3) |
May
(7) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(6) |
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
(5) |
Apr
|
May
(2) |
Jun
|
Jul
(4) |
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
2019 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jimmy Z. <cra...@co...> - 2006-08-22 20:14:03
|
----- Original Message ----- From: "Jimmy Zhang" <jz...@xi...> To: "Walinsky, Frank" <fwa...@ex...> Cc: <vtd...@li...> Sent: Tuesday, August 22, 2006 12:57 PM Subject: Re: push and pop > It may be a case-specific hack for you... > AutoPilot internally *remembers* a few variables (string name, depth etc), > to capture/restore, there may be additional data structure (e.g. a stack) > needed to store those variables... > > but my questions is how you really do anything different from using DOM > you will still have to instantiate nodeIterators for each levels right? > > also vtd...@li... seems to be working now > > > ----- Original Message ----- > From: "Walinsky, Frank" <fwa...@ex...> > To: "Jimmy Zhang" <jz...@xi...> > Sent: Tuesday, August 22, 2006 11:46 AM > Subject: RE: push and pop > > > What might be needed is a capture/restore of the AutoPilot along with > the currentoffset > > -----Original Message----- > From: Jimmy Zhang [mailto:jz...@xi...] Sent: Tuesday, August 22, > 2006 11:19 AM > To: Walinsky, Frank > Subject: Re: push and pop > > I need to think about this... > ----- Original Message ----- > From: "Walinsky, Frank" <fwa...@ex...> > To: <jz...@xi...> > Sent: Tuesday, August 22, 2006 4:13 AM > Subject: RE: push and pop > > > I will try another approach using just 1 AutoPilot by trying to rework > my current code. > > -----Original Message----- > From: jz...@xi... [mailto:jz...@xi...] Sent: Monday, > August 21, 2006 10:18 PM > To: Walinsky, Frank > Subject: Re: push and pop > > Can you kinda explain why allocating multiple AutoPilots is > show-stopping?? > because autoPilot allocation is fairly light weight? Maybe VTD-XML can > custom build something for your case... > > Walinsky, Frank writes: >> Jimmy, >> I did try 3 different AutoPilots after I sent this and that does work >> just as you thought it would. >> It also works without doing any push or pop. >> My situation has a user generated state which could have an unlimited >> amount of nesting. >> This state is processed by some generic code, which is not custom >> generated but used by all user generated states. >> This works fine with the DOM product I'm using. >> I was looking to reduce the memory footprint and do a proof of concept >> with your app for management. >> If I can't use the one AutoPilot with push and pop, I'm dead in the >> water with your app right out of the gate. >> If you could take a closer look, when you have time, and see if it is > a >> bug or just not possible, I would greatly appreciate it. >> Thx for all your help, >> Frank >> ________________________________ From: Jimmy Zhang >> [mailto:jz...@xi...] Sent: Monday, August 21, 2006 4:13 PM >> To: Walinsky, Frank; vtd...@li... >> Subject: Re: push and pop Frank, upon a quick look, I feel you might want >> to instantiate three >> autoPilot >> objects, the first one you set the element name "a," the second set > the >> element >> name "b," and the last one set the element "c." Could >> Cheers, >> Jimmy Zhang ----- Original Message ----- >> From: Walinsky, Frank <mailto:fwa...@ex...> To: >> in...@xi... Sent: Monday, August 21, 2006 12:40 PM >> Subject: push and pop Jim, >> I apologize again for using this address but it has become >> completely frustrrating trying to send to the list. >> I just tried a handfull of times and had my message returned. I >> did get one through this morning so I know it does work >> but when is anybodies guess. >> >> Could you please look at this when you have time? >> >> >> >> Is it possible and if so, is there any sample code somewhere >> that does the following: >> (I ask because I haven't found a way or I'm probably using the >> api incorrectly) >> >> I'm trying to use the same AutoPilot object in a 3 level nesting >> using a VTDNav "push" >> before entering the next lower level and the VTDNav "pop" before >> returning to the previous higher level. >> Neither the "iterate" or an "evalXPath" advance to the next >> element after the pop. >> >> Here's a sample xml I'm using: >> <?xml version=\"1.0\" encoding=\"UTF-8\"?> >> <root> >> <a> >> <b> >> <c>this is abc_11</c> >> <c>this is abc_12</c> >> <c>this is abc_13</c> >> <c>this is abc_14</c> >> </b> >> <b> >> <c>this is abc_21</c> >> <c>this is abc_22</c> >> <c>this is abc_23</c> >> <c>this is abc_24</c> >> </b> >> </a> >> <a> >> <b> >> <c>this is second abc_31</c> >> </b> >> </a> >> </root> >> >> >> Here's a snippet of my code. >> >> vn.toElement(VTDNav.ROOT); >> >> ap.selectElement("a"); >> while (ap.iterate()) { >> System.out.println("--- before push() of \"a\"---"); >> vn.dumpContext(); >> vn.push(); >> System.out.println("a = " + vn.getCurrentIndex()); >> >> ap.selectElement("b"); >> while (ap.iterate()) { >> System.out.println("--- before push() of \"b\" ---"); >> vn.dumpContext(); >> vn.push(); >> System.out.println("b = " + vn.getCurrentIndex()); >> >> ap.selectElement("c"); >> while (ap.iterate()) { >> System.out.println("c = " + vn.getCurrentIndex()); >> } >> >> vn.pop(); >> System.out.println("--- after pop() of \"b\" ---"); >> vn.dumpContext(); >> } >> >> vn.pop(); >> System.out.println("--- after pop() of \"a\" ---"); >> vn.dumpContext(); >> } >> > > > > > > > > |
From: Tatu S. <cow...@ya...> - 2006-08-22 01:05:30
|
--- Jimmy Zhang <cra...@co...> wrote: > How many different kind of encoding does woodstox > currently support?? Natively just couple (UTF-8, ISO-8859-1, UTF-32), first 2 for performance reasons, third because JDK doesn't support it. Others are handled by using JDK constructed Reader. EBCDIC doesn't yet work; I'm not sure what'd be the best way (JDK may have some decoders, but I don't know exactly which ones to use, nor have many sample docs). It's nice to be able to use JDK decoders as a fallback. With NIO, perhaps VTD-XML could use them too? I have been thinking of writing my own UTF-16 decoder/encoder, for performance reasons and to be able to do better character validation, but haven't had time. > I am thinking about adding more encoding support > right now VTD-XML supports > support UTF8 ascii iso-8859 UTF-16BE and UTF-16LE That's a reasonable starting point I think. UTF-32 is quite easy to support, but I don't know if many use it (there were same test docs though, in XMLTest test suite). Other iso-8859-x encodings beyond -1 might be easy too: you just have to map byte values 128 - 255 to other parts of Unicode tables (I think). -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Jimmy Z. <cra...@co...> - 2006-08-22 00:02:21
|
How many different kind of encoding does woodstox currently support?? I am thinking about adding more encoding support right now VTD-XML supports support UTF8 ascii iso-8859 UTF-16BE and UTF-16LE ----- Original Message ----- From: "Tatu Saloranta" <cow...@ya...> To: "Jimmy Zhang" <cra...@co...>; <vtd...@li...> Cc: <n96...@ma...> Sent: Thursday, August 17, 2006 10:11 PM Subject: Re: [Vtd-xml-users] Fw: Problems with VDT (fwd) > --- Jimmy Zhang <cra...@co...> wrote: > >> one of the early email from din sush asks about how >> to split a large file >> into smaller ones >> >> then I thought about it and felt that a better >> solution (than current >> VTD-XML or Woodstox) >> can indeed be built.... >> >> the basic idea is to record only the offset and >> length of an element when >> splitting, so it doesn't >> need to read the whole thing into memory like >> VTD-XML did, nor does it need >> to need to perform >> decode/re-encoding and string creation like Pull.... >> >> after retaining the offset and length, just copy the >> file segment into >> separate files.... >> >> what do you guys think? > > For the specific splitting task that would be good. > Maybe such utility could be written, perhaps being > passed an Xpath expression defining where to split the > file (like, defining root nodes of resulting docs?). > And you could probably use much of VTD-XML code as > core of such tool? > > That should be very fast & memory efficient solution. > > -+ Tatu +- > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > |
From: Jimmy Z. <cra...@co...> - 2006-08-21 18:18:23
|
1.7 version will add expand XPath functions and fixes bugs as reported = by=20 users of version 1.6,=20 2.0 will introduce VTD+XML indexing features... ----- Original Message -----=20 From: Walinsky, Frank=20 To: vtd...@li...=20 Sent: Monday, August 21, 2006 8:41 AM Subject: [Vtd-xml-users] ver 1.7 What's the planned new features for version 1.7? -------------------------------------------------------------------------= ----- = -------------------------------------------------------------------------= Using Tomcat but need to do more? Need to support web services, = security? Get stuff done quickly with pre-integrated technology to make your job = easier Download IBM WebSphere Application Server v.1.0.1 based on Apache = Geronimo = http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat=3D= 121642 -------------------------------------------------------------------------= ----- _______________________________________________ Vtd-xml-users mailing list Vtd...@li... https://lists.sourceforge.net/lists/listinfo/vtd-xml-users |
From: Walinsky, F. <fwa...@ex...> - 2006-08-21 15:42:50
|
What's the planned new features for version 1.7? |
From: Tatu S. <cow...@ya...> - 2006-08-18 05:11:21
|
--- Jimmy Zhang <cra...@co...> wrote: > one of the early email from din sush asks about how > to split a large file > into smaller ones > > then I thought about it and felt that a better > solution (than current > VTD-XML or Woodstox) > can indeed be built.... > > the basic idea is to record only the offset and > length of an element when > splitting, so it doesn't > need to read the whole thing into memory like > VTD-XML did, nor does it need > to need to perform > decode/re-encoding and string creation like Pull.... > > after retaining the offset and length, just copy the > file segment into > separate files.... > > what do you guys think? For the specific splitting task that would be good. Maybe such utility could be written, perhaps being passed an Xpath expression defining where to split the file (like, defining root nodes of resulting docs?). And you could probably use much of VTD-XML code as core of such tool? That should be very fast & memory efficient solution. -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Jimmy Z. <cra...@co...> - 2006-08-17 20:18:43
|
one of the early email from din sush asks about how to split a large file into smaller ones then I thought about it and felt that a better solution (than current VTD-XML or Woodstox) can indeed be built.... the basic idea is to record only the offset and length of an element when splitting, so it doesn't need to read the whole thing into memory like VTD-XML did, nor does it need to need to perform decode/re-encoding and string creation like Pull.... after retaining the offset and length, just copy the file segment into separate files.... what do you guys think? ----- Original Message ----- From: "Tatu Saloranta" <cow...@ya...> To: "Jimmy Zhang" <cra...@co...>; <vtd...@li...> Sent: Wednesday, August 16, 2006 5:39 PM Subject: Re: [Vtd-xml-users] Fw: Problems with VDT (fwd) > > The problem looks like someone is trying to use JDK > 1.4 with jars compiled on 1.5 (or similarly for 1.4 > and 1.3 etc). > > -+ Tatu +- > > --- Jimmy Zhang <cra...@co...> wrote: > >> >> ----- Original Message ----- >> From: Jimmy Zhang >> To: pau...@ci... >> Sent: Wednesday, August 16, 2006 11:41 AM >> Subject: Re: Problems with VDT (fwd) >> >> >> Hi, Can you elaborate on the set up of your >> environment? >> Did you point the classpath to the right place?? >> >> cheers, >> Jzhang >> ----- Original Message ----- >> From: sa...@xi... >> To: jz...@xi... >> Sent: Wednesday, August 16, 2006 11:39 AM >> Subject: Problems with VDT (fwd) >> >> >> >> I've tried using your parser (I'm using what I >> think is the most recent >> version of the .zip file, 1.6) and keep getting an >> exception at compile >> time: >> >> >> >> UnsupportedClassVersionError: >> com/ximpleware/VTDGen (Unsupported >> major.minor version 49.0) >> >> >> >> I get the same error when I try to compile the >> examples. It's >> apparently looking for Version 48 and one (or >> more) of the classes in >> the zip is set to v.49. Is there anything I can >> do to get around this? >> Rebuild the source? >> >> >> >> Thanks, >> >> >> >> Paul Uzee >> >> Cingular Wireless >> >> >> >> >> >> Paul Uzee >> >> >> >> Cell >> >> 404-771-1833, >> 770-380-6572 or >> >> 404-754-4704 >> >> Home >> >> 770-454-6559 >> >> iPager >> >> pu...@im... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > ------------------------------------------------------------------------------ >> >> >> I've tried using your parser (I'm using what I >> think is the most recent version of the .zip file, >> 1.6) and keep getting an exception at compile time: >> >> >> >> UnsupportedClassVersionError: >> com/ximpleware/VTDGen (Unsupported major.minor >> version 49.0) >> >> >> >> I get the same error when I try to compile the >> examples. It's apparently looking for Version 48 >> and one (or more) of the classes in the zip is set >> to v.49. Is there anything I can do to get around >> this? Rebuild the source? >> >> >> >> Thanks, >> >> >> >> Paul Uzee >> >> Cingular Wireless >> >> >> >> >> >> Paul Uzee >> >> >> Cell >> 404-771-1833, >> 770-380-6572 or >> >> 404-754-4704 >> >> Home >> 770-454-6559 >> >> iPager >> pu...@im... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > > ------------------------------------------------------------------------- >> Using Tomcat but need to do more? Need to support >> web services, security? >> Get stuff done quickly with pre-integrated >> technology to make your job easier >> Download IBM WebSphere Application Server v.1.0.1 >> based on Apache Geronimo >> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642> > _______________________________________________ >> Vtd-xml-users mailing list >> Vtd...@li... >> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > |
From: JamesLee <n96...@ma...> - 2006-08-17 01:04:45
|
Hi=20,=20 I=20think=20the=20vtd=20java=20library=20is=20not=20built=20with=20the=20sam= e=20version=20Java=20SDK=20with=20your=20environment. ex.=20JDK=201.4=20vs=20JDK=201.5 So...=20your=20java=20runtime=20detect=20the=20different=20version. James. -----Original=20message----- From:vtd...@li... To:vtd...@li... Date:Wed,=2016=20Aug=202006=2017:12:37=20-0700 Subject:Vtd-xml-users=20Digest,=20Vol=203,=20Issue=207 Send=20Vtd-xml-users=20mailing=20list=20submissions=20to =09v...@li... To=20subscribe=20or=20unsubscribe=20via=20the=20World=20Wide=20Web,=20visit =09https://lists.sourceforge.net/lists/listinfo/vtd-xml-users or,=20via=20email,=20send=20a=20message=20with=20subject=20or=20body=20'help= '=20to =09v...@li... You=20can=20reach=20the=20person=20managing=20the=20list=20at =09v...@li... When=20replying,=20please=20edit=20your=20Subject=20line=20so=20it=20is=20mo= re=20specific than=20"Re:=20Contents=20of=20Vtd-xml-users=20digest..." Today's=20Topics: =20=20=201.=20Fw:=20Problems=20with=20VDT=20(fwd)=20(Jimmy=20Zhang) ---------------------------------------------------------------------- Message:=201 Date:=20Wed,=2016=20Aug=202006=2016:19:09=20-0700 From:=20"Jimmy=20Zhang"=20<cra...@co...> Subject:=20[Vtd-xml-users]=20Fw:=20Problems=20with=20VDT=20(fwd) To:=20<vtd...@li...> Message-ID:=20<001c01c6c18a$64437d30$0d02a8c0@ximpleware> Content-Type:=20text/plain;=20charset=3D"iso-8859-1" -----=20Original=20Message=20-----=20 From:=20Jimmy=20Zhang=20 To:=20p...@ci...=20 Sent:=20Wednesday,=20August=2016,=202006=2011:41=20AM Subject:=20Re:=20Problems=20with=20VDT=20(fwd) Hi,=20Can=20you=20elaborate=20on=20the=20set=20up=20of=20your=20environment?= Did=20you=20point=20the=20classpath=20to=20the=20right=20place?? cheers, Jzhang =20=20-----=20Original=20Message=20-----=20 =20=20From:=20...@xi...=20 =20=20To:=20j...@xi...=20 =20=20Sent:=20Wednesday,=20August=2016,=202006=2011:39=20AM =20=20Subject:=20Problems=20with=20VDT=20(fwd) =20=20I've=20tried=20using=20your=20parser=20(I'm=20using=20what=20I=20think= =20is=20the=20most=20recent =20=20version=20of=20the=20.zip=20file,=201.6)=20and=20keep=20getting=20an=20= exception=20at=20compile =20=20time:=20 =20=20=20 =20=20UnsupportedClassVersionError:=20com/ximpleware/VTDGen=20(Unsupported =20=20major.minor=20version=2049.0)=20 =20=20=20 =20=20I=20get=20the=20same=20error=20when=20I=20try=20to=20compile=20the=20e= xamples.=20=20It's =20=20apparently=20looking=20for=20Version=2048=20and=20one=20(or=20more)=20= of=20the=20classes=20in =20=20the=20zip=20is=20set=20to=20v.49.=20=20Is=20there=20anything=20I=20can= =20do=20to=20get=20around=20this? =20=20Rebuild=20the=20source?=20 =20=20=20 =20=20Thanks,=20 =20=20=20 =20=20Paul=20Uzee=20 =20=20Cingular=20Wireless=20 =20=20=20 =20=20=20 =20=20Paul=20Uzee=20 =20=20=20 =20=20Cell=20 =20=20404-771-1833, =20=20770-380-6572=20or=20 =20=20404-754-4704=20 =20=20Home=20 =20=20770-454-6559=20 =20=20iPager=20 =20=20...@im...=20 =20=20=20 =20=20=20 =20=20=20 =20=20=20 =20=20=20 =20=20=20 =20=20=20 ----------------------------------------------------------------------------= -- =20=20I've=20tried=20using=20your=20parser=20(I'm=20using=20what=20I=20think= =20is=20the=20most=20recent=20version=20of=20the=20.zip=20file,=201.6)=20and= =20keep=20getting=20an=20exception=20at=20compile=20time: =20=20=20 =20=20UnsupportedClassVersionError:=20com/ximpleware/VTDGen=20(Unsupported=20= major.minor=20version=2049.0) =20=20=20 =20=20I=20get=20the=20same=20error=20when=20I=20try=20to=20compile=20the=20e= xamples.=20=20It's=20apparently=20looking=20for=20Version=2048=20and=20one=20= (or=20more)=20of=20the=20classes=20in=20the=20zip=20is=20set=20to=20v.49.=20= =20Is=20there=20anything=20I=20can=20do=20to=20get=20around=20this?=20=20Reb= uild=20the=20source? =20=20=20 =20=20Thanks, =20=20=20 =20=20Paul=20Uzee =20=20Cingular=20Wireless =20=20=20 =20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20Paul=20Uzee =20=20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20Cell =20=20=20=20=20=20=20=20=20=20=20=20=20404-771-1833,=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20770-380-6572=20or =20=20=20=20=20=20=20=20=20=20=20=20=20=20404-754-4704 =20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20Home =20=20=20=20=20=20=20=20=20=20=20=20=20770-454-6559 =20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20iPager =20=20=20=20=20=20=20=20=20=20=20=20=20...@im... =20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20 =20=20=20 =20=20=20 =20=20=20 =20=20=20 =20=20=20 =20=20=20 =20=20=20 --------------=20next=20part=20-------------- An=20HTML=20attachment=20was=20scrubbed... URL:=20http://sourceforge.net/mailarchive/forum.php?forum=3Dvtd-xml-users/at= tachments/20060816/1a8ff921/attachment.html=20 ------------------------------ ------------------------------------------------------------------------- Using=20Tomcat=20but=20need=20to=20do=20more?=20Need=20to=20support=20web=20= services,=20security? Get=20stuff=20done=20quickly=20with=20pre-integrated=20technology=20to=20mak= e=20your=20job=20easier Download=20IBM=20WebSphere=20Application=20Server=20v.1.0.1=20based=20on=20A= pache=20Geronimo http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat=3D12= 1642 ------------------------------ _______________________________________________ Vtd-xml-users=20mailing=20list Vtd...@li... https://lists.sourceforge.net/lists/listinfo/vtd-xml-users End=20of=20Vtd-xml-users=20Digest,=20Vol=203,=20Issue=207 ******************************************* |
From: Tatu S. <cow...@ya...> - 2006-08-17 00:39:54
|
The problem looks like someone is trying to use JDK 1.4 with jars compiled on 1.5 (or similarly for 1.4 and 1.3 etc). -+ Tatu +- --- Jimmy Zhang <cra...@co...> wrote: > > ----- Original Message ----- > From: Jimmy Zhang > To: pau...@ci... > Sent: Wednesday, August 16, 2006 11:41 AM > Subject: Re: Problems with VDT (fwd) > > > Hi, Can you elaborate on the set up of your > environment? > Did you point the classpath to the right place?? > > cheers, > Jzhang > ----- Original Message ----- > From: sa...@xi... > To: jz...@xi... > Sent: Wednesday, August 16, 2006 11:39 AM > Subject: Problems with VDT (fwd) > > > > I've tried using your parser (I'm using what I > think is the most recent > version of the .zip file, 1.6) and keep getting an > exception at compile > time: > > > > UnsupportedClassVersionError: > com/ximpleware/VTDGen (Unsupported > major.minor version 49.0) > > > > I get the same error when I try to compile the > examples. It's > apparently looking for Version 48 and one (or > more) of the classes in > the zip is set to v.49. Is there anything I can > do to get around this? > Rebuild the source? > > > > Thanks, > > > > Paul Uzee > > Cingular Wireless > > > > > > Paul Uzee > > > > Cell > > 404-771-1833, > 770-380-6572 or > > 404-754-4704 > > Home > > 770-454-6559 > > iPager > > pu...@im... > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > I've tried using your parser (I'm using what I > think is the most recent version of the .zip file, > 1.6) and keep getting an exception at compile time: > > > > UnsupportedClassVersionError: > com/ximpleware/VTDGen (Unsupported major.minor > version 49.0) > > > > I get the same error when I try to compile the > examples. It's apparently looking for Version 48 > and one (or more) of the classes in the zip is set > to v.49. Is there anything I can do to get around > this? Rebuild the source? > > > > Thanks, > > > > Paul Uzee > > Cingular Wireless > > > > > > Paul Uzee > > > Cell > 404-771-1833, > 770-380-6572 or > > 404-754-4704 > > Home > 770-454-6559 > > iPager > pu...@im... > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support > web services, security? > Get stuff done quickly with pre-integrated > technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 > based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642> _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Jimmy Z. <cra...@co...> - 2006-08-16 23:19:41
|
----- Original Message -----=20 From: Jimmy Zhang=20 To: pau...@ci...=20 Sent: Wednesday, August 16, 2006 11:41 AM Subject: Re: Problems with VDT (fwd) Hi, Can you elaborate on the set up of your environment? Did you point the classpath to the right place?? cheers, Jzhang ----- Original Message -----=20 From: sa...@xi...=20 To: jz...@xi...=20 Sent: Wednesday, August 16, 2006 11:39 AM Subject: Problems with VDT (fwd) I've tried using your parser (I'm using what I think is the most = recent version of the .zip file, 1.6) and keep getting an exception at = compile time:=20 =20 UnsupportedClassVersionError: com/ximpleware/VTDGen (Unsupported major.minor version 49.0)=20 =20 I get the same error when I try to compile the examples. It's apparently looking for Version 48 and one (or more) of the classes in the zip is set to v.49. Is there anything I can do to get around = this? Rebuild the source?=20 =20 Thanks,=20 =20 Paul Uzee=20 Cingular Wireless=20 =20 =20 Paul Uzee=20 =20 Cell=20 404-771-1833, 770-380-6572 or=20 404-754-4704=20 Home=20 770-454-6559=20 iPager=20 pu...@im...=20 =20 =20 =20 =20 =20 =20 =20 -------------------------------------------------------------------------= ----- I've tried using your parser (I'm using what I think is the most = recent version of the .zip file, 1.6) and keep getting an exception at = compile time: =20 UnsupportedClassVersionError: com/ximpleware/VTDGen (Unsupported = major.minor version 49.0) =20 I get the same error when I try to compile the examples. It's = apparently looking for Version 48 and one (or more) of the classes in = the zip is set to v.49. Is there anything I can do to get around this? = Rebuild the source? =20 Thanks, =20 Paul Uzee Cingular Wireless =20 =20 Paul Uzee =20 =20 Cell 404-771-1833,=20 770-380-6572 or 404-754-4704 =20 Home 770-454-6559 =20 iPager pu...@im... =20 =20 =20 =20 =20 =20 =20 =20 =20 |
From: Jimmy Z. <cra...@co...> - 2006-08-04 01:16:33
|
Is there any data on the performance of splitting files using VTD-XML? It would certainly be interesting to know about.... ----- Original Message ----- From: "Din Sush" <di...@ya...> To: "Tatu Saloranta" <cow...@ya...>; <vtd...@li...> Sent: Thursday, August 03, 2006 4:36 AM Subject: Re: [Vtd-xml-users] VTD-XML Query >I tried woodstox parser, it seems to be working and > for a 1 GB file it is taking around 11 mins to split > the file in multiple 1 MB files. > > Thanks for your suggestion!! I was just wondering if I > can make it any faster, I am using > "copyEventFromEventMethod" to write to the file. > > Thanks again. > > --- Tatu Saloranta <cow...@ya...> wrote: > >> --- Din Sush <di...@ya...> wrote: >> >> > Well I only need to split the document and don't >> > need >> > to go back to parsed document, and I don't need >> DOM >> > like functionality. >> > >> > Will VTD-XML be still better in this scenario. >> >> I would suggest that if you do have time, you >> investigate both using VTD-XML, and a Stax >> implementation (such as >> http://woodstox.codehaus.org). >> My feeling is that it all comes down to which one >> API >> you feel more comfortable with, or perhaps whether >> have to use a xml-compliant standard-based solution >> or >> not. >> Both can perform well enough, assuming you are not >> limited by VTD-XML due to main memory requirements. >> Stax memory usage is not linear with document >> length, >> so there are no practical input size limitations. >> >> If you do end up both approaches, it would be very >> nice to get the performance numbers, since this >> would >> be an actual real-world use case, instead of >> benchmarks. Plus if code is simple enough, perhaps >> it >> could become a benchmark for these types of >> operations? >> >> > Secondly as the entire document needs to be loaded >> > in >> > the memory, the whole idea of splitting is that I >> am >> > getting "Out of Memory" error won't I get the same >> > error when I am using VTD-XML, than it kind of >> > defeats >> > the purpose. Correct me if I am wrong in the >> > interpretation as I have never used VTD. >> >> You are correct here. While limit is much higher >> than >> with, say, DOM (2x or perhaps 3x), there is a limit. >> >> -+ Tatu +- >> >> >> __________________________________________________ >> Do You Yahoo!? >> Tired of spam? Yahoo! Mail has the best spam >> protection around >> http://mail.yahoo.com >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > |
From: Tatu S. <cow...@ya...> - 2006-08-03 18:19:06
|
--- Din Sush <di...@ya...> wrote: > I tried woodstox parser, it seems to be working and > for a 1 GB file it is taking around 11 mins to split > the file in multiple 1 MB files. Hmmh. That sounds bit slow, for typical disks and all (with maybe 30 MBps read speed, and bit higher write speed). I'd expect it to take roughly maybe a minute or so. Can you share the code? I would profile it (just using 'java -Xrunhprof:cpu=samples', running the code for a minute or so). > Thanks for your suggestion!! I was just wondering if > I > can make it any faster, I am using > "copyEventFromEventMethod" to write to the file. I guess it all depends on code in question (and possibly file in question might affect speed a bit, shouldn't matter very much though). Can you send the code? I could test it against test files I have created. -+ Tatu +- > > Thanks again. > > --- Tatu Saloranta <cow...@ya...> wrote: > > > --- Din Sush <di...@ya...> wrote: > > > > > Well I only need to split the document and don't > > > need > > > to go back to parsed document, and I don't need > > DOM > > > like functionality. > > > > > > Will VTD-XML be still better in this scenario. > > > > I would suggest that if you do have time, you > > investigate both using VTD-XML, and a Stax > > implementation (such as > > http://woodstox.codehaus.org). > > My feeling is that it all comes down to which one > > API > > you feel more comfortable with, or perhaps whether > > have to use a xml-compliant standard-based > solution > > or > > not. > > Both can perform well enough, assuming you are not > > limited by VTD-XML due to main memory > requirements. > > Stax memory usage is not linear with document > > length, > > so there are no practical input size limitations. > > > > If you do end up both approaches, it would be very > > nice to get the performance numbers, since this > > would > > be an actual real-world use case, instead of > > benchmarks. Plus if code is simple enough, perhaps > > it > > could become a benchmark for these types of > > operations? > > > > > Secondly as the entire document needs to be > loaded > > > in > > > the memory, the whole idea of splitting is that > I > > am > > > getting "Out of Memory" error won't I get the > same > > > error when I am using VTD-XML, than it kind of > > > defeats > > > the purpose. Correct me if I am wrong in the > > > interpretation as I have never used VTD. > > > > You are correct here. While limit is much higher > > than > > with, say, DOM (2x or perhaps 3x), there is a > limit. > > > > -+ Tatu +- > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > > protection around > > http://mail.yahoo.com > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get > the chance to share your > opinions on IT & business topics through brief > surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Jimmy Z. <cra...@co...> - 2006-08-03 16:17:52
|
How much memory do you have for your machine? Can you increase heap size to 1500m ?? ----- Original Message ----- From: "Din Sush" <di...@ya...> To: "Jimmy Zhang" <cra...@co...>; <vtd...@li...> Sent: Thursday, August 03, 2006 4:32 AM Subject: Re: [Vtd-xml-users] VTD-XML Query >I increased the JVM heap size but even after that I > was getting "out of memory" error of 800 MB data. > > --- Jimmy Zhang <cra...@co...> wrote: > >> For parsing large XML files, make sure you set the >> maximum JVM heap size to >> a bigger enough value >> >> I think the command is "java -server -Xmx600m >> <yourclass>.class." >> >> ----- Original Message ----- >> From: "Din Sush" <di...@ya...> >> To: "Jimmy Zhang" <cra...@co...>; >> <vtd...@li...> >> Sent: Tuesday, August 01, 2006 1:54 AM >> Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >> > >> > Hi, >> > >> > We are having problems parsing an XML file of size >> > 250MB. >> > >> > I got an "Out of memory error" for >> > >> > // open a file and read the content into a byte >> array >> > >> > File f = new File("./servers.xml"); >> > >> > FileInputStream fis = new FileInputStream(f); >> > >> > On this line >> > >> > >>>> byte[] b = new byte[(int) f.length()]; >> > >> > This code can be found on >> > http://vtd-xml.sourceforge.net/codeSample/cs1.html >> > >> > And it is not related to the file splitting. It >> fails >> > during initialization itself. >> > >> > Any thoughts on that!! >> > >> > >> > --- Jimmy Zhang <cra...@co...> wrote: >> > >> >> Usually the out of memory happens when you parses >> >> the file into a DOM >> >> tree... >> >> >> >> Assuming that you have enough memory to hold the >> >> document in memory, VTD-XML >> >> should >> >> compare very favorably against SAX or Pull in >> terms >> >> of coding effort and >> >> performance... >> >> even you don't need to go back parsed document >> and >> >> don't care about DOM like >> >> functionalites >> >> ----- Original Message ----- >> >> From: "Din Sush" <di...@ya...> >> >> To: "Jimmy Zhang" <cra...@co...>; "Tatu >> >> Saloranta" >> >> <cow...@ya...>; >> >> <vtd...@li...> >> >> Sent: Monday, July 31, 2006 8:20 PM >> >> Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >> >> >> >> > Well I only need to split the document and >> don't >> >> need >> >> > to go back to parsed document, and I don't need >> >> DOM >> >> > like functionality. >> >> > >> >> > Will VTD-XML be still better in this scenario. >> >> > >> >> > Secondly as the entire document needs to be >> loaded >> >> in >> >> > the memory, the whole idea of splitting is that >> I >> >> am >> >> > getting "Out of Memory" error won't I get the >> same >> >> > error when I am using VTD-XML, than it kind of >> >> defeats >> >> > the purpose. Correct me if I am wrong in the >> >> > interpretation as I have never used VTD. >> >> > >> >> > >> >> > >> >> > --- Jimmy Zhang <cra...@co...> wrote: >> >> > >> >> >> Well, the problem with streaming approach is >> that >> >> >> you will need to parse >> >> >> then reserialize, >> >> >> both CPU intensive, with VTD-XML it becomes a >> lot >> >> >> more efficient, but you >> >> >> need to load >> >> >> the document in memory first, so there first >> has >> >> to >> >> >> be enough memory >> >> >> available... but on the >> >> >> other hand, using steaming API like SAX or >> PULL, >> >> you >> >> >> will need to read in >> >> >> the document >> >> >> piecewise anyway, so overall I think VTD-XML >> >> should >> >> >> win quite >> >> >> significantly... >> >> >> >> >> >> My view of VTD-XML is that it is just like >> DOM, >> >> you >> >> >> can jump back and forth >> >> >> as often >> >> >> as you want... yet it parses a lot faster than >> >> >> DOM... >> >> >> ----- Original Message ----- >> >> >> From: "Tatu Saloranta" >> <cow...@ya...> >> >> >> To: "Din Sush" <di...@ya...>; >> >> >> <vtd...@li...> >> >> >> Sent: Monday, July 31, 2006 11:53 AM >> >> >> Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >> >> >> >> >> >> >> > --- Din Sush <di...@ya...> wrote: >> >> >> > >> >> >> >> Here is my requirement >> >> >> >> >> >> >> >> I need to split really big XML files(1 GB >> >> plus) >> >> >> into >> >> >> >> smaller sized files. >> >> >> >> I am in the process of evaluating different >> >> >> >> approaches. >> >> >> >> 1. Use Vtd-XML, parse and split. >> >> >> >> 2. Use Perl XML::Twig split function >> >> >> >> 3. Writing my own parser in perl on top of >> >> >> >> XML::Parser, >> >> >> >> which uses expat. >> >> >> >> 4. Use libxml2. >> >> >> > >> >> >> > To me, this does sound like you would be >> better >> >> >> off >> >> >> > using a streaming approach (SAX, StAX or >> >> XmlPull; >> >> >> or >> >> >> > .net equivalent of the last 2; StAX and >> XmlPull >> >> >> are >> >> >> > Java things). I don't know if there are >> >> perl-basd >> >> >> > streaming equivalents, but I think expat and >> >> >> libxml2 >> >> >> > have streaming SAX interfaces (or similar) >> >> >> > >> >> >> > There doesn't seem to be much need for >> random >> >> >> access, >> >> >> > nor need to keep any portions in memory. >> >> Streaming >> >> >> > approaches have no problem with files of any >> >> size >> >> >> > (certainly no problems with 1 GB), and for >> >> >> splitting I >> >> >> > personally do not think VTD-XML would be >> faster >> >> >> than >> >> >> > the alternatives. This because all the >> content >> >> has >> >> >> to >> >> >> > be accessed -- VTD-XML is fastest when you >> need >> >> to >> >> >> > access as little data as possible. >> >> >> > >> >> >> > -+ Tatu +- >> >> >> > >> >> >> > >> >> >> > >> >> >> __________________________________________________ >> >> >> > Do You Yahoo!? >> >> >> > Tired of spam? Yahoo! Mail has the best >> spam >> >> >> protection around >> >> >> > http://mail.yahoo.com >> >> >> > >> > === message truncated === > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > |
From: Din S. <di...@ya...> - 2006-08-03 11:36:59
|
I tried woodstox parser, it seems to be working and for a 1 GB file it is taking around 11 mins to split the file in multiple 1 MB files. Thanks for your suggestion!! I was just wondering if I can make it any faster, I am using "copyEventFromEventMethod" to write to the file. Thanks again. --- Tatu Saloranta <cow...@ya...> wrote: > --- Din Sush <di...@ya...> wrote: > > > Well I only need to split the document and don't > > need > > to go back to parsed document, and I don't need > DOM > > like functionality. > > > > Will VTD-XML be still better in this scenario. > > I would suggest that if you do have time, you > investigate both using VTD-XML, and a Stax > implementation (such as > http://woodstox.codehaus.org). > My feeling is that it all comes down to which one > API > you feel more comfortable with, or perhaps whether > have to use a xml-compliant standard-based solution > or > not. > Both can perform well enough, assuming you are not > limited by VTD-XML due to main memory requirements. > Stax memory usage is not linear with document > length, > so there are no practical input size limitations. > > If you do end up both approaches, it would be very > nice to get the performance numbers, since this > would > be an actual real-world use case, instead of > benchmarks. Plus if code is simple enough, perhaps > it > could become a benchmark for these types of > operations? > > > Secondly as the entire document needs to be loaded > > in > > the memory, the whole idea of splitting is that I > am > > getting "Out of Memory" error won't I get the same > > error when I am using VTD-XML, than it kind of > > defeats > > the purpose. Correct me if I am wrong in the > > interpretation as I have never used VTD. > > You are correct here. While limit is much higher > than > with, say, DOM (2x or perhaps 3x), there is a limit. > > -+ Tatu +- > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Tatu S. <cow...@ya...> - 2006-08-02 22:05:05
|
--- Din Sush <di...@ya...> wrote: > Hi, > I have explored woodstox also, issue with STAX > parsers that I am having is I need to build complete > xml, > whereas in vtd or sax i can get the fragment and > write it to a file. I'm pretty sure you are confusing SAX and DOM here... since SAX only gives you individual nodes, similar to what StAX does, not subtrees (you can of course reconstruct sub-trees from events, but that's not the same thing). So, I assume you mean 'VTD or DOM'. But I'm not quite sure what would be complicated in building XML using Event API of Stax (it is bit more complicated if using raw cursor API, ie. XMLStreamReader and XMLStreamWriter -- maybe you have only used it so far?). With Event API It's just events in and events out. Bit of recursion for copying, and that's pretty much it, for simple merging. ... > Now I want to put say 10 person records in file 1and > so on, with vtd, i will get the fragment and I will > write that to file, with sax also I can get entire If bitwise exact copy does work, yes. This is not necessarily the case if namespaces are used (or if DTD-based entities are used). > person record, but with SAX, I don't get the > complete > person record. So I have to create a XMLWriter, use, > functions like writeStartElement, writeAttribute, > etc. > Basically building the entire structure which is > already there. Yes. That's streaming. With XMLEventWriter you just add XMLEvents you get from XMLEventReader, but you do need to pipe them through, looping. Bit more work, but not a lot (just need to keep track of pairing start/end tags). -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Jimmy Z. <cra...@co...> - 2006-08-02 21:02:47
|
yeah, I responded the request, can you somehow split The XML file into smaller chunks?? ----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> To: <cra...@co...>; <vtd...@li...> Sent: Wednesday, August 02, 2006 1:08 PM Subject: Re: [Vtd-xml-users] VTD-XML Query > Well, you might read my question below: >>>> >>>>>Hi Jimmy, >>>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>>What version of the VTD-XML so that I could try to explore the large >>>>>XML file. >>>>> > > I entered a bug on sourceforge website about this. However, I did not > stated very clear. The malloc function in C takes an integer. The > integer is 4 bits (8 bytes). It's about 8 digits number or 99999999. > > 2 Gig = 2000000000 or ten digits number. > > It creates the overflow when it tries to allocate the memory. > > I think the 20 MB will work OK. > > > > >>From: "Jimmy Zhang" <cra...@co...> >>To: "Chinh Ho" >><ho_...@ho...>,<vtd...@li...> >>Subject: Re: [Vtd-xml-users] VTD-XML Query >>Date: Wed, 2 Aug 2006 12:22:57 -0700 >> >>Can you first try to parse a smaller document like 20MB to see it works ok >>or not? >> >>I suspect that the file size is getting too big so that it overflows the >>32-bit integers, >>causing it to intepret is a negative value... >>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>To: <cra...@co...> >>Cc: <vtd...@li...> >>Sent: Wednesday, August 02, 2006 10:19 AM >>Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >>>I used version 1.6. Here are the steps that I did: >>>1/ Download the ximpleware_1.6_c_light and extract them out to a folder >>>named "ximpleware". >>>2/ Open MS Visual Studio 2005. >>>3/ Open new empty C++ Win32 Console Application project. >>>4/ Open the "ximpleware" folder. Select all files. Drag and drop them >>>to the Solution Explorer window in MS 2005. >>>5/ Click build solution. >>>6/ Copy the 2 Gig xml file to the debug folder. >>>7/ Open the benchmark_vtdxml.c . Comment out the int argc and char >>>*argv[] in main() >>>8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml >>>file name. >>>9/ Replace the argv[1] in the next line with the same xml file name: >>>(stat(argv[1], &s)) >>>10/ Put breakpoints at the line f = fopen("foo.xml", "r"); >>> and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); >>>11/ Press F5. >>>12/ On the Autos window, it shows the s.st_size is -858993460. >>> On the cmd window, it shows the same size of the file : "size of >>> the file is -858993460" >>>13/ Press F10 twice. >>>14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion >>>Failed! Program:... File: fread.c Line: 93 Expression: >>>(buffer != NULL) >>> >>>Please let me know what step(s) that I did wrong. Also, how do you turn >>>off the namespace support when parsing. >>>I could not use the ximpleware_1.6_c because of the .l and .y files. I >>>think these files are for the Unix version, aren't they? >>> >>> >>> >>> >>> >>>>From: "Jimmy Zhang" <cra...@co...> >>>>To: "Chinh Ho" <ho_...@ho...> >>>>CC: <vtd...@li...> >>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>Date: Mon, 31 Jul 2006 11:52:48 -0700 >>>> >>>>Version 1.6, when you turn off namespace support when parsing... >>>>the max is 1GB when namespace enabled... >>>> >>>>also don't forget to CC vtd-xml-user to keep a record >>>>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>>>To: <cra...@co...> >>>>Sent: Monday, July 31, 2006 11:50 AM >>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>> >>>> >>>>>Hi Jimmy, >>>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>>What version of the VTD-XML so that I could try to explore the large >>>>>XML file. >>>>> >>>>> >>>>> >>>>> >>>>>>From: "Jimmy Zhang" <cra...@co...> >>>>>>To: <vtd...@li...> >>>>>>CC: Din Sush <di...@ya...> >>>>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>>>>> >>>>>>I think VTD-XML should have a couple of distinct advantages for >>>>>>splitting >>>>>>XML, performance >>>>>>probably being the biggest reason... currently VTD-XML's file size >>>>>>support >>>>>>is 2GB, and you need >>>>>>to have enough memory to hold the document in memory... >>>>>> >>>>>>I haven't tried other approaches, but they seem like SAX based, and >>>>>>may be >>>>>>slower and less flexible >>>>>>(SAX is forward only), >>>>>> >>>>>>Let me know if there are any questions... you are welcome to share >>>>>>your >>>>>>experience with us >>>>>> >>>>>>----- Original Message ----- >>>>>>From: "Din Sush" <di...@ya...> >>>>>>To: <vtd...@li...> >>>>>>Sent: Monday, July 31, 2006 5:23 AM >>>>>>Subject: [Vtd-xml-users] VTD-XML Query >>>>>> >>>>>> >>>>>> > Here is my requirement >>>>>> > >>>>>> > I need to split really big XML files(1 GB plus) into >>>>>> > smaller sized files. >>>>>> > I am in the process of evaluating different >>>>>> > approaches. >>>>>> > 1. Use Vtd-XML, parse and split. >>>>>> > 2. Use Perl XML::Twig split function >>>>>> > 3. Writing my own parser in perl on top of >>>>>> > XML::Parser, >>>>>> > which uses expat. >>>>>> > 4. Use libxml2. >>>>>> > >>>>>> > I am not sure if this is the right place to post this >>>>>> > question, but would like to know the best approach to >>>>>> > get the job done effectively. >>>>>> > >>>>>> > I would like to know the pros/cons and limitations of >>>>>> > my proposed solutions. >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > __________________________________________________ >>>>>> > Do You Yahoo!? >>>>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>>>> > http://mail.yahoo.com >>>>>> > >>>>>> > >>>>>>------------------------------------------------------------------------- >>>>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>>>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>>>>share >>>>>> > your >>>>>> > opinions on IT & business topics through brief surveys -- and earn >>>>>>cash >>>>>> > >>>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>>> > _______________________________________________ >>>>>> > Vtd-xml-users mailing list >>>>>> > Vtd...@li... >>>>>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>>------------------------------------------------------------------------- >>>>>>Take Surveys. Earn Cash. Influence the Future of IT >>>>>>Join SourceForge.net's Techsay panel and you'll get the chance to >>>>>>share your >>>>>>opinions on IT & business topics through brief surveys -- and earn >>>>>>cash >>>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>>>_______________________________________________ >>>>>>Vtd-xml-users mailing list >>>>>>Vtd...@li... >>>>>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > > |
From: Chinh H. <ho_...@ho...> - 2006-08-02 20:08:23
|
Well, you might read my question below: >>> >>>>Hi Jimmy, >>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>What version of the VTD-XML so that I could try to explore the large XML >>>>file. >>>> I entered a bug on sourceforge website about this. However, I did not stated very clear. The malloc function in C takes an integer. The integer is 4 bits (8 bytes). It's about 8 digits number or 99999999. 2 Gig = 2000000000 or ten digits number. It creates the overflow when it tries to allocate the memory. I think the 20 MB will work OK. >From: "Jimmy Zhang" <cra...@co...> >To: "Chinh Ho" <ho_...@ho...>,<vtd...@li...> >Subject: Re: [Vtd-xml-users] VTD-XML Query >Date: Wed, 2 Aug 2006 12:22:57 -0700 > >Can you first try to parse a smaller document like 20MB to see it works ok >or not? > >I suspect that the file size is getting too big so that it overflows the >32-bit integers, >causing it to intepret is a negative value... >----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >To: <cra...@co...> >Cc: <vtd...@li...> >Sent: Wednesday, August 02, 2006 10:19 AM >Subject: Re: [Vtd-xml-users] VTD-XML Query > > >>I used version 1.6. Here are the steps that I did: >>1/ Download the ximpleware_1.6_c_light and extract them out to a folder >>named "ximpleware". >>2/ Open MS Visual Studio 2005. >>3/ Open new empty C++ Win32 Console Application project. >>4/ Open the "ximpleware" folder. Select all files. Drag and drop them to >>the Solution Explorer window in MS 2005. >>5/ Click build solution. >>6/ Copy the 2 Gig xml file to the debug folder. >>7/ Open the benchmark_vtdxml.c . Comment out the int argc and char >>*argv[] in main() >>8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml >>file name. >>9/ Replace the argv[1] in the next line with the same xml file name: >>(stat(argv[1], &s)) >>10/ Put breakpoints at the line f = fopen("foo.xml", "r"); >> and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); >>11/ Press F5. >>12/ On the Autos window, it shows the s.st_size is -858993460. >> On the cmd window, it shows the same size of the file : "size of the >>file is -858993460" >>13/ Press F10 twice. >>14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion >>Failed! Program:... File: fread.c Line: 93 Expression: (buffer >>!= NULL) >> >>Please let me know what step(s) that I did wrong. Also, how do you turn >>off the namespace support when parsing. >>I could not use the ximpleware_1.6_c because of the .l and .y files. I >>think these files are for the Unix version, aren't they? >> >> >> >> >> >>>From: "Jimmy Zhang" <cra...@co...> >>>To: "Chinh Ho" <ho_...@ho...> >>>CC: <vtd...@li...> >>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>Date: Mon, 31 Jul 2006 11:52:48 -0700 >>> >>>Version 1.6, when you turn off namespace support when parsing... >>>the max is 1GB when namespace enabled... >>> >>>also don't forget to CC vtd-xml-user to keep a record >>>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>>To: <cra...@co...> >>>Sent: Monday, July 31, 2006 11:50 AM >>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>> >>> >>>>Hi Jimmy, >>>>You said that the VTD-XML currently support maximum file size of 2GB. >>>>What version of the VTD-XML so that I could try to explore the large XML >>>>file. >>>> >>>> >>>> >>>> >>>>>From: "Jimmy Zhang" <cra...@co...> >>>>>To: <vtd...@li...> >>>>>CC: Din Sush <di...@ya...> >>>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>>>> >>>>>I think VTD-XML should have a couple of distinct advantages for >>>>>splitting >>>>>XML, performance >>>>>probably being the biggest reason... currently VTD-XML's file size >>>>>support >>>>>is 2GB, and you need >>>>>to have enough memory to hold the document in memory... >>>>> >>>>>I haven't tried other approaches, but they seem like SAX based, and may >>>>>be >>>>>slower and less flexible >>>>>(SAX is forward only), >>>>> >>>>>Let me know if there are any questions... you are welcome to share your >>>>>experience with us >>>>> >>>>>----- Original Message ----- >>>>>From: "Din Sush" <di...@ya...> >>>>>To: <vtd...@li...> >>>>>Sent: Monday, July 31, 2006 5:23 AM >>>>>Subject: [Vtd-xml-users] VTD-XML Query >>>>> >>>>> >>>>> > Here is my requirement >>>>> > >>>>> > I need to split really big XML files(1 GB plus) into >>>>> > smaller sized files. >>>>> > I am in the process of evaluating different >>>>> > approaches. >>>>> > 1. Use Vtd-XML, parse and split. >>>>> > 2. Use Perl XML::Twig split function >>>>> > 3. Writing my own parser in perl on top of >>>>> > XML::Parser, >>>>> > which uses expat. >>>>> > 4. Use libxml2. >>>>> > >>>>> > I am not sure if this is the right place to post this >>>>> > question, but would like to know the best approach to >>>>> > get the job done effectively. >>>>> > >>>>> > I would like to know the pros/cons and limitations of >>>>> > my proposed solutions. >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > __________________________________________________ >>>>> > Do You Yahoo!? >>>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>>> > http://mail.yahoo.com >>>>> > >>>>> > >>>>>------------------------------------------------------------------------- >>>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>>>share >>>>> > your >>>>> > opinions on IT & business topics through brief surveys -- and earn >>>>>cash >>>>> > >>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>> > _______________________________________________ >>>>> > Vtd-xml-users mailing list >>>>> > Vtd...@li... >>>>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>>> > >>>>> >>>>> >>>>> >>>>>------------------------------------------------------------------------- >>>>>Take Surveys. Earn Cash. Influence the Future of IT >>>>>Join SourceForge.net's Techsay panel and you'll get the chance to share >>>>>your >>>>>opinions on IT & business topics through brief surveys -- and earn cash >>>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>>_______________________________________________ >>>>>Vtd-xml-users mailing list >>>>>Vtd...@li... >>>>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>> >>>> >>>> >>> >>> >> >> >> > > |
From: Jimmy Z. <cra...@co...> - 2006-08-02 19:23:14
|
Can you first try to parse a smaller document like 20MB to see it works ok or not? I suspect that the file size is getting too big so that it overflows the 32-bit integers, causing it to intepret is a negative value... ----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> To: <cra...@co...> Cc: <vtd...@li...> Sent: Wednesday, August 02, 2006 10:19 AM Subject: Re: [Vtd-xml-users] VTD-XML Query >I used version 1.6. Here are the steps that I did: > 1/ Download the ximpleware_1.6_c_light and extract them out to a folder > named "ximpleware". > 2/ Open MS Visual Studio 2005. > 3/ Open new empty C++ Win32 Console Application project. > 4/ Open the "ximpleware" folder. Select all files. Drag and drop them to > the Solution Explorer window in MS 2005. > 5/ Click build solution. > 6/ Copy the 2 Gig xml file to the debug folder. > 7/ Open the benchmark_vtdxml.c . Comment out the int argc and char > *argv[] in main() > 8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml > file name. > 9/ Replace the argv[1] in the next line with the same xml file name: > (stat(argv[1], &s)) > 10/ Put breakpoints at the line f = fopen("foo.xml", "r"); > and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); > 11/ Press F5. > 12/ On the Autos window, it shows the s.st_size is -858993460. > On the cmd window, it shows the same size of the file : "size of the > file is -858993460" > 13/ Press F10 twice. > 14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion > Failed! Program:... File: fread.c Line: 93 Expression: (buffer > != NULL) > > Please let me know what step(s) that I did wrong. Also, how do you turn > off the namespace support when parsing. > I could not use the ximpleware_1.6_c because of the .l and .y files. I > think these files are for the Unix version, aren't they? > > > > > >>From: "Jimmy Zhang" <cra...@co...> >>To: "Chinh Ho" <ho_...@ho...> >>CC: <vtd...@li...> >>Subject: Re: [Vtd-xml-users] VTD-XML Query >>Date: Mon, 31 Jul 2006 11:52:48 -0700 >> >>Version 1.6, when you turn off namespace support when parsing... >>the max is 1GB when namespace enabled... >> >>also don't forget to CC vtd-xml-user to keep a record >>----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >>To: <cra...@co...> >>Sent: Monday, July 31, 2006 11:50 AM >>Subject: Re: [Vtd-xml-users] VTD-XML Query >> >> >>>Hi Jimmy, >>>You said that the VTD-XML currently support maximum file size of 2GB. >>>What version of the VTD-XML so that I could try to explore the large XML >>>file. >>> >>> >>> >>> >>>>From: "Jimmy Zhang" <cra...@co...> >>>>To: <vtd...@li...> >>>>CC: Din Sush <di...@ya...> >>>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>>> >>>>I think VTD-XML should have a couple of distinct advantages for >>>>splitting >>>>XML, performance >>>>probably being the biggest reason... currently VTD-XML's file size >>>>support >>>>is 2GB, and you need >>>>to have enough memory to hold the document in memory... >>>> >>>>I haven't tried other approaches, but they seem like SAX based, and may >>>>be >>>>slower and less flexible >>>>(SAX is forward only), >>>> >>>>Let me know if there are any questions... you are welcome to share your >>>>experience with us >>>> >>>>----- Original Message ----- >>>>From: "Din Sush" <di...@ya...> >>>>To: <vtd...@li...> >>>>Sent: Monday, July 31, 2006 5:23 AM >>>>Subject: [Vtd-xml-users] VTD-XML Query >>>> >>>> >>>> > Here is my requirement >>>> > >>>> > I need to split really big XML files(1 GB plus) into >>>> > smaller sized files. >>>> > I am in the process of evaluating different >>>> > approaches. >>>> > 1. Use Vtd-XML, parse and split. >>>> > 2. Use Perl XML::Twig split function >>>> > 3. Writing my own parser in perl on top of >>>> > XML::Parser, >>>> > which uses expat. >>>> > 4. Use libxml2. >>>> > >>>> > I am not sure if this is the right place to post this >>>> > question, but would like to know the best approach to >>>> > get the job done effectively. >>>> > >>>> > I would like to know the pros/cons and limitations of >>>> > my proposed solutions. >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > __________________________________________________ >>>> > Do You Yahoo!? >>>> > Tired of spam? Yahoo! Mail has the best spam protection around >>>> > http://mail.yahoo.com >>>> > >>>> > >>>>------------------------------------------------------------------------- >>>> > Take Surveys. Earn Cash. Influence the Future of IT >>>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>>share >>>> > your >>>> > opinions on IT & business topics through brief surveys -- and earn >>>>cash >>>> > >>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>> > _______________________________________________ >>>> > Vtd-xml-users mailing list >>>> > Vtd...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>>> > >>>> >>>> >>>> >>>>------------------------------------------------------------------------- >>>>Take Surveys. Earn Cash. Influence the Future of IT >>>>Join SourceForge.net's Techsay panel and you'll get the chance to share >>>>your >>>>opinions on IT & business topics through brief surveys -- and earn cash >>>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>>_______________________________________________ >>>>Vtd-xml-users mailing list >>>>Vtd...@li... >>>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>> >>> >>> >> >> > > > |
From: Jimmy Z. <cra...@co...> - 2006-08-02 19:14:06
|
A few reasons this"elementByIndex(index) is not a good thing to do 1. While an element is identified by its index, an index doesn't always = map to an element 2. VTD-XML's internal node presentation is more than just an index = value, so supply an index and expect VTD-XML's navigator to jump to it is inadequate... I think that it boils to the fact that VTD-XML is not DOM, so the code = style will be different, this is a subtle difference that requires developers to rethink their XML processing = tasks... ----- Original Message -----=20 From: tony yin=20 To: Vtd...@li...=20 Sent: Wednesday, August 02, 2006 1:42 AM Subject: Re: [Vtd-xml-users] Need more arbitrary random access So, It's hard to achieve "vn.toElementByIndex(index)", right? I am continuing need for features like that! -------------------------------------------------------------------------= ----- = -------------------------------------------------------------------------= Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to = share your opinions on IT & business topics through brief surveys -- and earn = cash = http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV -------------------------------------------------------------------------= ----- _______________________________________________ Vtd-xml-users mailing list Vtd...@li... https://lists.sourceforge.net/lists/listinfo/vtd-xml-users |
From: Chinh H. <ho_...@ho...> - 2006-08-02 17:20:04
|
I used version 1.6. Here are the steps that I did: 1/ Download the ximpleware_1.6_c_light and extract them out to a folder named "ximpleware". 2/ Open MS Visual Studio 2005. 3/ Open new empty C++ Win32 Console Application project. 4/ Open the "ximpleware" folder. Select all files. Drag and drop them to the Solution Explorer window in MS 2005. 5/ Click build solution. 6/ Copy the 2 Gig xml file to the debug folder. 7/ Open the benchmark_vtdxml.c . Comment out the int argc and char *argv[] in main() 8/ In the line: f = fopen(argv[1], "r"), replace the argv[1] with the xml file name. 9/ Replace the argv[1] in the next line with the same xml file name: (stat(argv[1], &s)) 10/ Put breakpoints at the line f = fopen("foo.xml", "r"); and xml = (UByte *)malloc(sizeof(UByte)*(int)s.st_size); 11/ Press F5. 12/ On the Autos window, it shows the s.st_size is -858993460. On the cmd window, it shows the same size of the file : "size of the file is -858993460" 13/ Press F10 twice. 14/ A MS Visual C++ Debug Library appears. It says: "Debug Assertion Failed! Program:... File: fread.c Line: 93 Expression: (buffer != NULL) Please let me know what step(s) that I did wrong. Also, how do you turn off the namespace support when parsing. I could not use the ximpleware_1.6_c because of the .l and .y files. I think these files are for the Unix version, aren't they? >From: "Jimmy Zhang" <cra...@co...> >To: "Chinh Ho" <ho_...@ho...> >CC: <vtd...@li...> >Subject: Re: [Vtd-xml-users] VTD-XML Query >Date: Mon, 31 Jul 2006 11:52:48 -0700 > >Version 1.6, when you turn off namespace support when parsing... >the max is 1GB when namespace enabled... > >also don't forget to CC vtd-xml-user to keep a record >----- Original Message ----- From: "Chinh Ho" <ho_...@ho...> >To: <cra...@co...> >Sent: Monday, July 31, 2006 11:50 AM >Subject: Re: [Vtd-xml-users] VTD-XML Query > > >>Hi Jimmy, >>You said that the VTD-XML currently support maximum file size of 2GB. What >>version of the VTD-XML so that I could try to explore the large XML file. >> >> >> >> >>>From: "Jimmy Zhang" <cra...@co...> >>>To: <vtd...@li...> >>>CC: Din Sush <di...@ya...> >>>Subject: Re: [Vtd-xml-users] VTD-XML Query >>>Date: Mon, 31 Jul 2006 08:59:35 -0700 >>> >>>I think VTD-XML should have a couple of distinct advantages for splitting >>>XML, performance >>>probably being the biggest reason... currently VTD-XML's file size >>>support >>>is 2GB, and you need >>>to have enough memory to hold the document in memory... >>> >>>I haven't tried other approaches, but they seem like SAX based, and may >>>be >>>slower and less flexible >>>(SAX is forward only), >>> >>>Let me know if there are any questions... you are welcome to share your >>>experience with us >>> >>>----- Original Message ----- >>>From: "Din Sush" <di...@ya...> >>>To: <vtd...@li...> >>>Sent: Monday, July 31, 2006 5:23 AM >>>Subject: [Vtd-xml-users] VTD-XML Query >>> >>> >>> > Here is my requirement >>> > >>> > I need to split really big XML files(1 GB plus) into >>> > smaller sized files. >>> > I am in the process of evaluating different >>> > approaches. >>> > 1. Use Vtd-XML, parse and split. >>> > 2. Use Perl XML::Twig split function >>> > 3. Writing my own parser in perl on top of >>> > XML::Parser, >>> > which uses expat. >>> > 4. Use libxml2. >>> > >>> > I am not sure if this is the right place to post this >>> > question, but would like to know the best approach to >>> > get the job done effectively. >>> > >>> > I would like to know the pros/cons and limitations of >>> > my proposed solutions. >>> > >>> > >>> > >>> > >>> > >>> > __________________________________________________ >>> > Do You Yahoo!? >>> > Tired of spam? Yahoo! Mail has the best spam protection around >>> > http://mail.yahoo.com >>> > >>> > >>>------------------------------------------------------------------------- >>> > Take Surveys. Earn Cash. Influence the Future of IT >>> > Join SourceForge.net's Techsay panel and you'll get the chance to >>>share >>> > your >>> > opinions on IT & business topics through brief surveys -- and earn >>>cash >>> > >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>> > _______________________________________________ >>> > Vtd-xml-users mailing list >>> > Vtd...@li... >>> > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >>> > >>> >>> >>> >>>------------------------------------------------------------------------- >>>Take Surveys. Earn Cash. Influence the Future of IT >>>Join SourceForge.net's Techsay panel and you'll get the chance to share >>>your >>>opinions on IT & business topics through brief surveys -- and earn cash >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>_______________________________________________ >>>Vtd-xml-users mailing list >>>Vtd...@li... >>>https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >> >> >> > > |
From: tony y. <gao...@gm...> - 2006-08-02 08:42:25
|
So, It's hard to achieve "vn.toElementByIndex(index)", right? I am continuing need for features like that! |
From: Din S. <di...@ya...> - 2006-08-02 05:21:28
|
Hi, I have explored woodstox also, issue with STAX parsers that I am having is I need to build complete xml, whereas in vtd or sax i can get the fragment and write it to a file. Example <persons> <person> <name>name1</name> <age>22</age> <address>address</address> </person> <person> . . </person> . . </persons> Now I want to put say 10 person records in file 1and so on, with vtd, i will get the fragment and I will write that to file, with sax also I can get entire person record, but with SAX, I don't get the complete person record. So I have to create a XMLWriter, use, functions like writeStartElement, writeAttribute, etc. Basically building the entire structure which is already there. Please let me know if there is a way to extract the complete person record. Thanks. --- Tatu Saloranta <cow...@ya...> wrote: > --- Din Sush <di...@ya...> wrote: > > > Well I only need to split the document and don't > > need > > to go back to parsed document, and I don't need > DOM > > like functionality. > > > > Will VTD-XML be still better in this scenario. > > I would suggest that if you do have time, you > investigate both using VTD-XML, and a Stax > implementation (such as > http://woodstox.codehaus.org). > My feeling is that it all comes down to which one > API > you feel more comfortable with, or perhaps whether > have to use a xml-compliant standard-based solution > or > not. > Both can perform well enough, assuming you are not > limited by VTD-XML due to main memory requirements. > Stax memory usage is not linear with document > length, > so there are no practical input size limitations. > > If you do end up both approaches, it would be very > nice to get the performance numbers, since this > would > be an actual real-world use case, instead of > benchmarks. Plus if code is simple enough, perhaps > it > could become a benchmark for these types of > operations? > > > Secondly as the entire document needs to be loaded > > in > > the memory, the whole idea of splitting is that I > am > > getting "Out of Memory" error won't I get the same > > error when I am using VTD-XML, than it kind of > > defeats > > the purpose. Correct me if I am wrong in the > > interpretation as I have never used VTD. > > You are correct here. While limit is much higher > than > with, say, DOM (2x or perhaps 3x), there is a limit. > > -+ Tatu +- > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Jimmy Z. <cra...@co...> - 2006-08-01 19:07:03
|
Woodstox is a very well-written Pull parser, and generally considered the fastest one > > ----- Original Message ----- > From: "Tatu Saloranta" <cow...@ya...> > To: "Din Sush" <di...@ya...>; <vtd...@li...> > Sent: Tuesday, August 01, 2006 11:34 AM > Subject: Re: [Vtd-xml-users] VTD-XML Query > > >> --- Din Sush <di...@ya...> wrote: >> >>> Well I only need to split the document and don't >>> need >>> to go back to parsed document, and I don't need DOM >>> like functionality. >>> >>> Will VTD-XML be still better in this scenario. >> >> I would suggest that if you do have time, you >> investigate both using VTD-XML, and a Stax >> implementation (such as http://woodstox.codehaus.org). >> My feeling is that it all comes down to which one API >> you feel more comfortable with, or perhaps whether >> have to use a xml-compliant standard-based solution or >> not. >> Both can perform well enough, assuming you are not >> limited by VTD-XML due to main memory requirements. >> Stax memory usage is not linear with document length, >> so there are no practical input size limitations. >> >> If you do end up both approaches, it would be very >> nice to get the performance numbers, since this would >> be an actual real-world use case, instead of >> benchmarks. Plus if code is simple enough, perhaps it >> could become a benchmark for these types of >> operations? >> >>> Secondly as the entire document needs to be loaded >>> in >>> the memory, the whole idea of splitting is that I am >>> getting "Out of Memory" error won't I get the same >>> error when I am using VTD-XML, than it kind of >>> defeats >>> the purpose. Correct me if I am wrong in the >>> interpretation as I have never used VTD. >> >> You are correct here. While limit is much higher than >> with, say, DOM (2x or perhaps 3x), there is a limit. >> >> -+ Tatu +- >> >> >> __________________________________________________ >> Do You Yahoo!? >> Tired of spam? Yahoo! Mail has the best spam protection around >> http://mail.yahoo.com >> >> ------------------------------------------------------------------------- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to share >> your >> opinions on IT & business topics through brief surveys -- and earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> _______________________________________________ >> Vtd-xml-users mailing list >> Vtd...@li... >> https://lists.sourceforge.net/lists/listinfo/vtd-xml-users >> > |
From: Jimmy Z. <cra...@co...> - 2006-08-01 19:06:31
|
>> >> > No, what I mean is that while document after >> parsing >> > only uses 1.3 - 1.7x memory, during parsing it >> needs >> > additional 1x storage for the input. There is no > ... >> Actually it doesn't need extra 1x storage, because >> vtd-xml >> also reads the byte content into its internal >> storage as the first >> step, it is part of 1.3x~1.7x memory consumption... > > Ok. JavaDocs did not indicate this -- it's actually > bit dangerous to reference the same array, as caller > may go ahead and start modifying or reusing it. I > assumed a copy was made, as is typically done when one > has to share raw mutable arrays. > > But as long as (java)docs clearly indicate that the > parser now owns the byte array, that's understandable > to avoid overhead. > true, they need to be careful on this.... > ... >> One of the most subtle point of VTD-XML is that in >> many cases, you >> never have to convert character data into java >> strings... why? because >> string is almost always an intermediate stage of >> processing, > > Only for some cases of data-oriented XML processing. > Strings are typical data artifacts for textual info. > There are of course other ways to represent data, such > as raw char arrays -- that's what SAX parsers typical > pass for CHARACTERS segments, not Strings. > I think once people starts working with VTD-XML, they will get comfortable using VTD records, ie. making string to VTD comparison, convert VTD to ints and floats... etc... > > But thinking about it now though, it probably was a > side > effect of virtual memory management. > With 176M file, and main memory of 1 gig, my system > may > actually be using all of its physical memory. If so, > it is not a property of VTD-XML processing. So it is > possible that larger data size does not have drastic > effects on performance. This could be verified with > a separate test machine, with bit more memory. I was able to push my machine (1GB memory loaded with Windows XP) to parse 400MB + files > > -+ Tatu +- > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > |
From: Tatu S. <cow...@ya...> - 2006-08-01 18:34:42
|
--- Din Sush <di...@ya...> wrote: > Well I only need to split the document and don't > need > to go back to parsed document, and I don't need DOM > like functionality. > > Will VTD-XML be still better in this scenario. I would suggest that if you do have time, you investigate both using VTD-XML, and a Stax implementation (such as http://woodstox.codehaus.org). My feeling is that it all comes down to which one API you feel more comfortable with, or perhaps whether have to use a xml-compliant standard-based solution or not. Both can perform well enough, assuming you are not limited by VTD-XML due to main memory requirements. Stax memory usage is not linear with document length, so there are no practical input size limitations. If you do end up both approaches, it would be very nice to get the performance numbers, since this would be an actual real-world use case, instead of benchmarks. Plus if code is simple enough, perhaps it could become a benchmark for these types of operations? > Secondly as the entire document needs to be loaded > in > the memory, the whole idea of splitting is that I am > getting "Out of Memory" error won't I get the same > error when I am using VTD-XML, than it kind of > defeats > the purpose. Correct me if I am wrong in the > interpretation as I have never used VTD. You are correct here. While limit is much higher than with, say, DOM (2x or perhaps 3x), there is a limit. -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |