You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(6) |
Jul
(21) |
Aug
(40) |
Sep
(7) |
Oct
(41) |
Nov
(52) |
Dec
(19) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(49) |
Feb
(37) |
Mar
(84) |
Apr
(11) |
May
(29) |
Jun
(9) |
Jul
(19) |
Aug
(9) |
Sep
(6) |
Oct
(5) |
Nov
(15) |
Dec
(3) |
2008 |
Jan
(7) |
Feb
(11) |
Mar
(25) |
Apr
(50) |
May
(7) |
Jun
(8) |
Jul
(10) |
Aug
(18) |
Sep
(1) |
Oct
(15) |
Nov
(1) |
Dec
(9) |
2009 |
Jan
(5) |
Feb
(2) |
Mar
(3) |
Apr
(5) |
May
(10) |
Jun
(4) |
Jul
(5) |
Aug
(5) |
Sep
(7) |
Oct
(15) |
Nov
(13) |
Dec
(6) |
2010 |
Jan
|
Feb
(3) |
Mar
(4) |
Apr
(6) |
May
|
Jun
(4) |
Jul
(12) |
Aug
(8) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2011 |
Jan
(19) |
Feb
(39) |
Mar
(28) |
Apr
(6) |
May
(7) |
Jun
(9) |
Jul
|
Aug
(1) |
Sep
|
Oct
(8) |
Nov
(3) |
Dec
(12) |
2012 |
Jan
(2) |
Feb
(1) |
Mar
(3) |
Apr
(4) |
May
(4) |
Jun
(3) |
Jul
(10) |
Aug
(2) |
Sep
(13) |
Oct
(24) |
Nov
(3) |
Dec
(1) |
2013 |
Jan
(11) |
Feb
(5) |
Mar
(4) |
Apr
(3) |
May
(3) |
Jun
(5) |
Jul
(7) |
Aug
(16) |
Sep
|
Oct
(7) |
Nov
(11) |
Dec
|
2014 |
Jan
(7) |
Feb
(4) |
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
(1) |
Sep
(3) |
Oct
|
Nov
(3) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
(11) |
May
(8) |
Jun
(3) |
Jul
(1) |
Aug
(3) |
Sep
(5) |
Oct
(2) |
Nov
(1) |
Dec
(1) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(3) |
May
(7) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(6) |
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
(5) |
Apr
|
May
(2) |
Jun
|
Jul
(4) |
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
2019 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Rodrigo C. <rn...@gm...> - 2007-03-14 11:19:40
|
Yes, that would be a nice add :-) Wouldn't solve the exception handling or true random access issues tought, but would be a nice addon. I propose "plup()" hehe Jimmy Zhang wrote: > Rodrigo, How about add a method (tentatively called set()) that basically > does pop() and push() in one shot , so your > example > > push(); > // do something > pop(); > push(); > // do something > pop(); > push(); > // do something > pop(); > push(); > // do something > pop(); > > becomes > > push(); > // do something > set(); > // do something > set(); > // do something > set(); > // do something > pop(); > > > ----- Original Message ----- From: "Rodrigo Cunha" <rn...@gm...> > To: "Mark Swanson" <ma...@Sc...>; "Jimmy Zhang" > <cra...@co...> > Cc: <vtd...@li...> > Sent: Tuesday, March 06, 2007 9:24 AM > Subject: Re: [Vtd-xml-users] Random Access Proposal (take 2) > > >> Hum, thanks for the hint also Mark. I normally use java.util >> collections, but i'll give fastutils a look, since they might be >> better for some of my problems. >> >> Jimmy, a context can be imported/exported as an array of integers, >> this being a bit raw perhaps... and leaving the programmer with the >> job of storing that array. I just created SimpleContext as a way of >> abstracting that implementation detail, but since you like >> performance, perhaps you could leave that raw. I still prefer the >> abstracted version... besides, the internal representation might >> change and the abstracted version would still work. >> >> So, yes, go ahead with my API even if you decide to change it later. >> >> Just as a little silly footnote I found my API is also useful in >> cases where you have: >> >> push(); >> // do something >> pop(); >> push(); >> // do something >> pop(); >> push(); >> // do something >> pop(); >> push(); >> // do something >> pop(); >> >> Can be abreviated to: >> >> setCtxFromNav(ctx); >> // do something >> setNavFromCtx(ctx); >> // do something >> setNavFromCtx(ctx); >> // do something >> setNavFromCtx(ctx); >> // do something >> setNavFromCtx(ctx); >> >> This is both faster and more readable. >> >> It's also useful for things like: >> >> xpto = getComplexToFindTransmissionPath(VTDNav n, String s); // might >> even use caching! >> push(); >> nav.setNavFromCtx(xpto); >> // Do something >> pop(); >> >> and also for exception handling: >> >> { >> setCtxFromNav(xpto); >> // do something complex and error-prone, >> // so I won't use the stack, but a few locally declared contexts instead >> // oops! crash! go to handler! >> finally { >> setNavFromCtx(xpto) >> // we are now clean >> } >> >> So, it made using VTD much more enjoyable to me, even in rather >> trivial situations. >> >> Mark Swanson wrote: >>> Jimmy Zhang wrote: >>>> >>>>> Just a FYI: I have cases where the key is an Integer, and cases where >>>>> it's a string. >>>> >>>> By Integer is it a java class? or just a primitive data type? Maybe >>>> I can >>>> modify Rodrigo's class and put it into CVS so you guys can use >>>> immediately... >>>> however, I can't guarantee that it will be included in the next >>>> release... >>>> Would that work? >>> >>> Oh, I always use native ints and fastutil wherever possible. >>> >>> Just a thought: I use autojar on my code to build a tiny fastutil >>> jar that just has the code I need. You could do the same thing to >>> get excellent native collections instead of writing your own. I see >>> you already wrote your own, but in case you need more.. Fastutil >>> uses the LGPL. >>> >>> Cheers. >>> >> >> > > > |
From: Fernando G. <fer...@gm...> - 2007-03-14 09:18:35
|
Sorry Jimmy, I will send it to you in .doc format. odt is OpenOffice Document (the Sun open source office suite) I have noticed that I haven't sent you the xml file I use to do the tests because it's quite big. I generated it with http://www.xml-benchmark.org/. I will send you a link to download the xml. About the class loader issues. I'm not absolutly sure, but the java classes have to be loaded and they are loaded on demand and that's why I think the first execution is slower. (it's also slower using the original VTD-XML) I understand your recomendation, but I think it would be a good thing to give the other possibility. I do need it, as I explain in the technical description. best regards, Fernando On 3/14/07, Jimmy Zhang <cra...@co...> wrote: > > Interesting as the main method actually is a GUI benchmarking ... > however, I can't seem to open the ODT file, > also I have yet to experience any class loader issues... ( maybe other > folks on this list can comment...) > regarding saving VTD into a separate file, there are pros and cons... > disk seeks on two separate might > slow things down a bit... my recommendation is to stick with the VTD+XML > format as it works with multiple > programming language and byte endianess... > > ----- Original Message ----- > *From:* Fernando Gonzalez <fer...@gm...> > *To:* Jimmy Zhang <cra...@co...> > *Sent:* Friday, March 09, 2007 3:28 AM > *Subject:* Re: [Vtd-xml-users] Storing parsing info > > I send you the two proposals mixed in one source folder. I don't think > it's going to be a problem since I also send you a description of the > changes. I hope it's clear and can be easily understood. > > In org.Prueba you can find a main method. > > greetings, > Fernando > > On 3/7/07, Jimmy Zhang <cra...@co...> wrote: > > > > Fernando, It is interetsing that you have substitute Byte[] with > > IbyteBuffer... since > > there is a level of indirection , the slight slow down should be > > expected... I would certainly > > be interested in your approach to the issue and feel free to send me the > > code... > > Cheers, > > Jimmy > > > > ----- Original Message ----- > > *From:* Fernando Gonzalez <fer...@gm...> > > *To:* Jimmy Zhang <cra...@co...> > > *Sent:* Wednesday, March 07, 2007 7:49 AM > > *Subject:* Re: [Vtd-xml-users] Storing parsing info > > > > Hi Jimmy, > > > > Writing the following I have found that may be it's quite complicated to > > understand since you don't know exactly the changes I have made. Even my > > tests are not thorough so maybe the best option is to submit a technical > > description of the changes, pros and cons, the code, and that kind of > > things. > > > > I have been testing the XPath performance problem and it seems like it's > > a classloader issue. As you can see in the following log the slowest XPath > > evaluation is the first, no matter how the parsing information is obtained. > > 391 ms->Load XML > > 2125 ms->Parse XML > > 31 ms->Evaluate XPath > > 0 ms->Evaluate XPath > > 0 ms->Evaluate XPath > > 453 ms->Store parse info > > 0 ms->Clear parse info > > 313 ms->Read Parse info > > 0 ms->Evaluate XPath > > 0 ms->Evaluate XPath > > > > I have been working in something more. I have done some changes to VTD > > and I have succeeded in the following. > > 1) The byte[] of the XML file is accessed through an interface > > (IByteBuffer). > > 2) When I use the UniByteBuffer implementation I get a bit slower > > results at parsing > > 391 ms->Load XML > > 2109 ms->Parse XML (vs 1890 ms I obtained accessing directly the byte[] > > buffer) > > 0,172 ms->Evaluate XPath > > 0,078 ms->Evaluate XPath > > 0,094 ms->Evaluate XPath > > 0,078 ms->Evaluate XPath > > 0,078 ms->Evaluate XPath > > > > 3) When I use an implementation that loads chunks as they are needed I > > get much slower results in parsing the file, but I get the same results > > evaluating a XPath expression. The advantage of this approach is that there > > is no need to load all the XML file in memory, so I have obtained the > > following results: > > > > 25406 ms->Parse XML > > 406 ms->Store parse info > > 0,156 ms->Evaluate XPath > > 0,093 ms->Evaluate XPath > > 0,078 ms->Evaluate XPath > > 0,093 ms->Evaluate XPath > > 0,094 ms->Evaluate XPath > > 0,078 ms->Evaluate XPath > > > > 500 ms->Read Parse info > > 0,235 ms->Evaluate XPath > > 0,094 ms->Evaluate XPath > > 0,078 ms->Evaluate XPath > > 0,094 ms->Evaluate XPath > > 0,078 ms->Evaluate XPath > > 0,094 ms->Evaluate XPath > > > > The great thing in these results is that the XML file was 100Mb and I > > run the program with the -Xmx64Mb jvm option (just enough to store the 30mb > > parsing info, and the 16mb buffer) > > > > Well, as I said before I can send you a technical description of the > > changes, pros and cons, and the code. > > > > cheers, > > Fernando > > > > On 3/7/07, Fernando Gonzalez <fer...@gm...> wrote: > > > > > > Hi Jimmy, > > > > > > Thanks for your response. > > > > > > I think I'm using the version 2.0 since I have tested the " > > > VTDGen.writeIndex" method. I looked for another solution because I > > > cannot remove the original XML file so I would have to store the XML file > > > twice: the original xml file and the file with the XML, VTD and LCs > > > created by "VTDGen.writeIndex". As I'm dealing with really big XML > > > files, that's a drawback. > > > > > > Yes, you're right, I have added code. Just three or four lines. If > > > you're interested I can explain thoroughly my solution. About the XPath > > > performance, I think that's a classloader issue. I will check that and I > > > will report the results. > > > > > > greetings, > > > Fernando > > > > > > On 3/6/07, Jimmy Zhang < cra...@co...> wrote: > > > > > > > > Hey Fernando, Thanks for the email.. I am glad VTD-XML is helpful. > > > > My question: Which version are you using? > > > > If you are currently using 2.0, it contains the indexing feature > > > > that might > > > > accomplish just what is described in your email. > > > > > > > > Your solution is to seperate XML from VTD and LC, which I think you > > > > must have added code to do that... > > > > > > > > VTD+XML (as in version 2.0) is to package XML, VTD and LCs into > > > > a single file... which should also work > > > > > > > > The only suspicious part is that the XPath performance dropped for > > > > your case ... which shouldn't happen > > > > > > > > Buffer reuse is useful if your app instantiates a VTDGen to > > > > sequentially > > > > process many incoming XML document ... > > > > > > > > if you deal only with one XML doc... buffer reuse won't make a big > > > > difference > > > > > > > > I think you might be interested in first investigating the > > > > persistence feature in > > > > 2.0, and there is a directory under code examples... > > > > Cheers, > > > > Jimmy > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > *From:* Fernando Gonzalez <fer...@gm...> > > > > *To:* vtd...@li... > > > > *Sent:* Tuesday, March 06, 2007 1:23 AM > > > > *Subject:* [Vtd-xml-users] Storing parsing info > > > > > > > > Hello, > > > > > > > > First of all I would like to congratulate you on your project, I > > > > really think it's great. > > > > > > > > Second, I want to use the java VTD-XML to do a certain task and I > > > > have succeeded but I don't know if I have done it in the right way, or there > > > > is a better one. Can you give me some advice? > > > > > > > > I want to evaluate some XPath expressions on a lot of files of this > > > > size and larger, so the memory eficiency is critical. The first idea that > > > > comes to my mind is to have a VTDGen object for each XML file but this > > > > solution leads to having all the XMLs loaded in memory in the "protected > > > > byte[] XMLDoc;" attribute in VTDGen class. So each time I have to evaluate a > > > > XPath expression in a XML file I have to read the xml file, parse it, > > > > evaluate XPath and set to null the VTDGen object to get the memory freed by > > > > the garbage collector. > > > > > > > > I have obtained these results reading a big XML file (~100Mb): > > > > > > > > 360 ms reading file > > > > 1890 ms parsing file > > > > 32 ms evaluating a XPath expression > > > > 93 ms showing results > > > > total = 2375 milliseconds > > > > > > > > Where the second step ("parsing file") means: > > > > VTDGen vg = new VTDGen(); > > > > vg.setDoc(b); > > > > vg.parse(true); > > > > > > > > > > > > To speed up the process I have stored the parsing information in a > > > > file. After that I can read the XML file and the parsing information file, > > > > evaluate the XPath expression and close everything again in a shorter time: > > > > 344 ms reading the file > > > > 422 ms reading parsing information > > > > 125 ms evaluating a XPath expression > > > > 93 ms showing results > > > > total = 984 > > > > > > > > I think the result is good enough but maybe there's a better > > > > solution than mine. I have stored the parsing info by serializing all the > > > > VTDGen object but the XMLDoc attribute. Then I retrieve the object from disk > > > > and I set the XMLDoc attribute. This way: > > > > > > > > ObjectInputStream ois = new ObjectInputStream(new > > > > FileInputStream(PARSING_INFO)); > > > > vg = (MyVTDGen) ois.readObject(); > > > > ois.close(); > > > > FileInputStream fis2 = new FileInputStream(TEST_XML); > > > > byte[] b2 = new byte[(int) f.length()]; > > > > fis2.read(b2); > > > > vg.setXML(b2); //This method only sets the XMLDoc > > > > attribute > > > > > > > > Is this solution good? Is there a better one? Can "Buffer reuse" > > > > solve my probem? > > > > > > > > best regards, > > > > Fernando > > > > > > > > ------------------------------ > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > Take Surveys. Earn Cash. Influence the Future of IT > > > > Join SourceForge.net's Techsay panel and you'll get the chance to > > > > share your > > > > opinions on IT & business topics through brief surveys-and earn cash > > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > > > > > > > > > ------------------------------ > > > > > > > > _______________________________________________ > > > > Vtd-xml-users mailing list > > > > Vtd...@li... > > > > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > > > > > > > > > > > > > > |
From: Jimmy Z. <cra...@co...> - 2007-03-14 07:34:58
|
Rodrigo, How about add a method (tentatively called set()) that basically does pop() and push() in one shot , so your example push(); // do something pop(); push(); // do something pop(); push(); // do something pop(); push(); // do something pop(); becomes push(); // do something set(); // do something set(); // do something set(); // do something pop(); ----- Original Message ----- From: "Rodrigo Cunha" <rn...@gm...> To: "Mark Swanson" <ma...@Sc...>; "Jimmy Zhang" <cra...@co...> Cc: <vtd...@li...> Sent: Tuesday, March 06, 2007 9:24 AM Subject: Re: [Vtd-xml-users] Random Access Proposal (take 2) > Hum, thanks for the hint also Mark. I normally use java.util collections, > but i'll give fastutils a look, since they might be better for some of my > problems. > > Jimmy, a context can be imported/exported as an array of integers, this > being a bit raw perhaps... and leaving the programmer with the job of > storing that array. I just created SimpleContext as a way of abstracting > that implementation detail, but since you like performance, perhaps you > could leave that raw. I still prefer the abstracted version... besides, > the internal representation might change and the abstracted version would > still work. > > So, yes, go ahead with my API even if you decide to change it later. > > Just as a little silly footnote I found my API is also useful in cases > where you have: > > push(); > // do something > pop(); > push(); > // do something > pop(); > push(); > // do something > pop(); > push(); > // do something > pop(); > > Can be abreviated to: > > setCtxFromNav(ctx); > // do something > setNavFromCtx(ctx); > // do something > setNavFromCtx(ctx); > // do something > setNavFromCtx(ctx); > // do something > setNavFromCtx(ctx); > > This is both faster and more readable. > > It's also useful for things like: > > xpto = getComplexToFindTransmissionPath(VTDNav n, String s); // might even > use caching! > push(); > nav.setNavFromCtx(xpto); > // Do something > pop(); > > and also for exception handling: > > { > setCtxFromNav(xpto); > // do something complex and error-prone, > // so I won't use the stack, but a few locally declared contexts instead > // oops! crash! go to handler! > finally { > setNavFromCtx(xpto) > // we are now clean > } > > So, it made using VTD much more enjoyable to me, even in rather trivial > situations. > > Mark Swanson wrote: >> Jimmy Zhang wrote: >>> >>>> Just a FYI: I have cases where the key is an Integer, and cases where >>>> it's a string. >>> >>> By Integer is it a java class? or just a primitive data type? Maybe I >>> can >>> modify Rodrigo's class and put it into CVS so you guys can use >>> immediately... >>> however, I can't guarantee that it will be included in the next >>> release... >>> Would that work? >> >> Oh, I always use native ints and fastutil wherever possible. >> >> Just a thought: I use autojar on my code to build a tiny fastutil jar >> that just has the code I need. You could do the same thing to get >> excellent native collections instead of writing your own. I see you >> already wrote your own, but in case you need more.. Fastutil uses the >> LGPL. >> >> Cheers. >> > > |
From: Jimmy Z. <cra...@co...> - 2007-03-14 03:07:33
|
Interesting as the main method actually is a GUI benchmarking ... = however, I can't seem to open the ODT file, also I have yet to experience any class loader issues... ( maybe other = folks on this list can comment...) regarding saving VTD into a separate file, there are pros and cons... = disk seeks on two separate might slow things down a bit... my recommendation is to stick with the VTD+XML = format as it works with multiple programming language and byte endianess...=20 ----- Original Message -----=20 From: Fernando Gonzalez=20 To: Jimmy Zhang=20 Sent: Friday, March 09, 2007 3:28 AM Subject: Re: [Vtd-xml-users] Storing parsing info I send you the two proposals mixed in one source folder. I don't think = it's going to be a problem since I also send you a description of the = changes. I hope it's clear and can be easily understood.=20 In org.Prueba you can find a main method. greetings, Fernando On 3/7/07, Jimmy Zhang <cra...@co...> wrote:=20 Fernando, It is interetsing that you have substitute Byte[] with = IbyteBuffer... since there is a level of indirection , the slight slow down should be = expected... I would certainly be interested in your approach to the issue and feel free to send me = the code... Cheers, Jimmy ----- Original Message -----=20 From: Fernando Gonzalez=20 To: Jimmy Zhang=20 Sent: Wednesday, March 07, 2007 7:49 AM Subject: Re: [Vtd-xml-users] Storing parsing info Hi Jimmy, Writing the following I have found that may be it's quite = complicated to understand since you don't know exactly the changes I = have made. Even my tests are not thorough so maybe the best option is to = submit a technical description of the changes, pros and cons, the code, = and that kind of things.=20 I have been testing the XPath performance problem and it seems = like it's a classloader issue. As you can see in the following log the = slowest XPath evaluation is the first, no matter how the parsing = information is obtained.=20 391 ms->Load XML 2125 ms->Parse XML 31 ms->Evaluate XPath 0 ms->Evaluate XPath 0 ms->Evaluate XPath 453 ms->Store parse info 0 ms->Clear parse info 313 ms->Read Parse info 0 ms->Evaluate XPath 0 ms->Evaluate XPath I have been working in something more. I have done some changes to = VTD and I have succeeded in the following. 1) The byte[] of the XML file is accessed through an interface = (IByteBuffer).=20 2) When I use the UniByteBuffer implementation I get a bit slower = results at parsing 391 ms->Load XML 2109 ms->Parse XML (vs 1890 ms I obtained accessing directly the = byte[] buffer) 0,172 ms->Evaluate XPath=20 0,078 ms->Evaluate XPath 0,094 ms->Evaluate XPath 0,078 ms->Evaluate XPath 0,078 ms->Evaluate XPath 3) When I use an implementation that loads chunks as they are = needed I get much slower results in parsing the file, but I get the same = results evaluating a XPath expression. The advantage of this approach is = that there is no need to load all the XML file in memory, so I have = obtained the following results:=20 25406 ms->Parse XML 406 ms->Store parse info 0,156 ms->Evaluate XPath 0,093 ms->Evaluate XPath 0,078 ms->Evaluate XPath 0,093 ms->Evaluate XPath 0,094 ms->Evaluate XPath 0,078 ms->Evaluate XPath=20 500 ms->Read Parse info 0,235 ms->Evaluate XPath 0,094 ms->Evaluate XPath 0,078 ms->Evaluate XPath 0,094 ms->Evaluate XPath 0,078 ms->Evaluate XPath 0,094 ms->Evaluate XPath The great thing in these results is that the XML file was 100Mb = and I run the program with the -Xmx64Mb jvm option (just enough to store = the 30mb parsing info, and the 16mb buffer) Well, as I said before I can send you a technical description of = the changes, pros and cons, and the code.=20 cheers, Fernando On 3/7/07, Fernando Gonzalez <fer...@gm...> wrote:=20 Hi Jimmy, Thanks for your response. I think I'm using the version 2.0 since I have tested the = "VTDGen.writeIndex" method. I looked for another solution because I = cannot remove the original XML file so I would have to store the XML = file twice: the original xml file and the file with the XML, VTD and LCs = created by "VTDGen.writeIndex". As I'm dealing with really big XML = files, that's a drawback. Yes, you're right, I have added code. Just three or four lines. = If you're interested I can explain thoroughly my solution. About the = XPath performance, I think that's a classloader issue. I will check that = and I will report the results.=20 greetings, Fernando=20 On 3/6/07, Jimmy Zhang < cra...@co...> wrote:=20 Hey Fernando, Thanks for the email.. I am glad VTD-XML is = helpful. My question: Which version are you using? =20 If you are currently using 2.0, it contains the indexing = feature that might accomplish just what is described in your email. Your solution is to seperate XML from VTD and LC, which I = think you must have added code to do that... VTD+XML (as in version 2.0) is to package XML, VTD and LCs = into=20 a single file... which should also work The only suspicious part is that the XPath performance dropped = for=20 your case ... which shouldn't happen=20 Buffer reuse is useful if your app instantiates a VTDGen to = sequentially process many incoming XML document ... if you deal only with one XML doc... buffer reuse won't make a = big difference I think you might be interested in first investigating the = persistence feature in=20 2.0, and there is a directory under code examples... Cheers, Jimmy ----- Original Message -----=20 From: Fernando Gonzalez=20 To: vtd...@li...=20 Sent: Tuesday, March 06, 2007 1:23 AM Subject: [Vtd-xml-users] Storing parsing info Hello, First of all I would like to congratulate you on your = project, I really think it's great. Second, I want to use the java VTD-XML to do a certain task = and I have succeeded but I don't know if I have done it in the right = way, or there is a better one. Can you give me some advice?=20 I want to evaluate some XPath expressions on a lot of files = of this size and larger, so the memory eficiency is critical. The first = idea that comes to my mind is to have a VTDGen object for each XML file = but this solution leads to having all the XMLs loaded in memory in the = "protected byte[] XMLDoc;" attribute in VTDGen class. So each time I = have to evaluate a XPath expression in a XML file I have to read the xml = file, parse it, evaluate XPath and set to null the VTDGen object to get = the memory freed by the garbage collector.=20 I have obtained these results reading a big XML file = (~100Mb): 360 ms reading file 1890 ms parsing file 32 ms evaluating a XPath expression 93 ms showing results total =3D 2375 milliseconds Where the second step ("parsing file") means: VTDGen vg =3D new VTDGen(); vg.setDoc(b); vg.parse(true); To speed up the process I have stored the parsing = information in a file. After that I can read the XML file and the = parsing information file, evaluate the XPath expression and close = everything again in a shorter time:=20 344 ms reading the file 422 ms reading parsing information 125 ms evaluating a XPath expression 93 ms showing results total =3D 984 I think the result is good enough but maybe there's a better = solution than mine. I have stored the parsing info by serializing all = the VTDGen object but the XMLDoc attribute. Then I retrieve the object = from disk and I set the XMLDoc attribute. This way:=20 ObjectInputStream ois =3D new = ObjectInputStream(new FileInputStream(PARSING_INFO)); vg =3D (MyVTDGen) ois.readObject(); ois.close(); FileInputStream fis2 =3D new = FileInputStream(TEST_XML);=20 byte[] b2 =3D new byte[(int) f.length()]; fis2.read(b2); vg.setXML(b2); //This method only sets the = XMLDoc attribute Is this solution good? Is there a better one? Can "Buffer = reuse" solve my probem?=20 best regards, Fernando -------------------------------------------------------------------- = -------------------------------------------------------------------------= Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the = chance to share your opinions on IT & business topics through brief surveys-and = earn cash = http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV=20 -------------------------------------------------------------------- _______________________________________________ Vtd-xml-users mailing list Vtd...@li... https://lists.sourceforge.net/lists/listinfo/vtd-xml-users |
From: Jimmy Z. <cra...@co...> - 2007-03-14 02:56:08
|
Thank... This is a preview of an article for DevX... there is another one "Index XML documents with VTD-XML" coming up ... ----- Original Message ----- From: "Mark Swanson" <ma...@Sc...> To: "Jimmy Zhang" <cra...@co...> Cc: <vtd...@li...> Sent: Tuesday, March 13, 2007 8:22 PM Subject: Re: [Vtd-xml-users] XPath tutorial > Jimmy Zhang wrote: >> Sorry... Here is URL, >> http://docs.google.com/Doc?id=dhdt7pxf_12g425c3 > > Nice article Jimmy. > > > -- > http://www.ScheduleWorld.com/ > Free Google Calendar synchronization with Outlook, Evolution, > cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird, > Pocket PC/Windows Mobile. Also sync tasks, notes and contacts! > WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support. > |
From: Mark S. <ma...@Sc...> - 2007-03-14 02:22:25
|
Jimmy Zhang wrote: > Sorry... Here is URL, > http://docs.google.com/Doc?id=dhdt7pxf_12g425c3 Nice article Jimmy. -- http://www.ScheduleWorld.com/ Free Google Calendar synchronization with Outlook, Evolution, cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird, Pocket PC/Windows Mobile. Also sync tasks, notes and contacts! WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support. |
From: Jimmy Z. <cra...@co...> - 2007-03-14 01:59:03
|
Sorry... Here is URL, http://docs.google.com/Doc?id=dhdt7pxf_12g425c3 ----- Original Message ----- From: "Jimmy Zhang" <cra...@co...> To: <vtd...@li...> Sent: Tuesday, March 13, 2007 6:37 PM Subject: [Vtd-xml-users] XPath tutorial > Here is an article focuing exclusive on XPath aspect of VTD-XML... > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > |
From: Jimmy Z. <cra...@co...> - 2007-03-14 01:38:22
|
Here is an article focuing exclusive on XPath aspect of VTD-XML... |
From: Jimmy Z. <cra...@co...> - 2007-03-07 19:16:34
|
Fernando, It is interetsing that you have substitute Byte[] with = IbyteBuffer... since there is a level of indirection , the slight slow down should be = expected... I would certainly be interested in your approach to the issue and feel free to send me the = code... Cheers, Jimmy ----- Original Message -----=20 From: Fernando Gonzalez=20 To: Jimmy Zhang=20 Sent: Wednesday, March 07, 2007 7:49 AM Subject: Re: [Vtd-xml-users] Storing parsing info Hi Jimmy, Writing the following I have found that may be it's quite complicated = to understand since you don't know exactly the changes I have made. Even = my tests are not thorough so maybe the best option is to submit a = technical description of the changes, pros and cons, the code, and that = kind of things.=20 I have been testing the XPath performance problem and it seems like = it's a classloader issue. As you can see in the following log the = slowest XPath evaluation is the first, no matter how the parsing = information is obtained.=20 391 ms->Load XML 2125 ms->Parse XML 31 ms->Evaluate XPath 0 ms->Evaluate XPath 0 ms->Evaluate XPath 453 ms->Store parse info 0 ms->Clear parse info 313 ms->Read Parse info 0 ms->Evaluate XPath 0 ms->Evaluate XPath I have been working in something more. I have done some changes to VTD = and I have succeeded in the following. 1) The byte[] of the XML file is accessed through an interface = (IByteBuffer).=20 2) When I use the UniByteBuffer implementation I get a bit slower = results at parsing 391 ms->Load XML 2109 ms->Parse XML (vs 1890 ms I obtained accessing directly the = byte[] buffer) 0,172 ms->Evaluate XPath=20 0,078 ms->Evaluate XPath 0,094 ms->Evaluate XPath 0,078 ms->Evaluate XPath 0,078 ms->Evaluate XPath 3) When I use an implementation that loads chunks as they are needed I = get much slower results in parsing the file, but I get the same results = evaluating a XPath expression. The advantage of this approach is that = there is no need to load all the XML file in memory, so I have obtained = the following results:=20 25406 ms->Parse XML 406 ms->Store parse info 0,156 ms->Evaluate XPath 0,093 ms->Evaluate XPath 0,078 ms->Evaluate XPath 0,093 ms->Evaluate XPath 0,094 ms->Evaluate XPath 0,078 ms->Evaluate XPath=20 500 ms->Read Parse info 0,235 ms->Evaluate XPath 0,094 ms->Evaluate XPath 0,078 ms->Evaluate XPath 0,094 ms->Evaluate XPath 0,078 ms->Evaluate XPath 0,094 ms->Evaluate XPath The great thing in these results is that the XML file was 100Mb and I = run the program with the -Xmx64Mb jvm option (just enough to store the = 30mb parsing info, and the 16mb buffer) Well, as I said before I can send you a technical description of the = changes, pros and cons, and the code.=20 cheers, Fernando On 3/7/07, Fernando Gonzalez <fer...@gm...> wrote: Hi Jimmy, Thanks for your response. I think I'm using the version 2.0 since I have tested the = "VTDGen.writeIndex" method. I looked for another solution because I = cannot remove the original XML file so I would have to store the XML = file twice: the original xml file and the file with the XML, VTD and LCs = created by "VTDGen.writeIndex". As I'm dealing with really big XML = files, that's a drawback. Yes, you're right, I have added code. Just three or four lines. If = you're interested I can explain thoroughly my solution. About the XPath = performance, I think that's a classloader issue. I will check that and I = will report the results.=20 greetings, Fernando On 3/6/07, Jimmy Zhang < cra...@co...> wrote: Hey Fernando, Thanks for the email.. I am glad VTD-XML is helpful. My question: Which version are you using? =20 If you are currently using 2.0, it contains the indexing feature = that might accomplish just what is described in your email. Your solution is to seperate XML from VTD and LC, which I think = you must have added code to do that... VTD+XML (as in version 2.0) is to package XML, VTD and LCs into=20 a single file... which should also work The only suspicious part is that the XPath performance dropped for = your case ... which shouldn't happen=20 Buffer reuse is useful if your app instantiates a VTDGen to = sequentially process many incoming XML document ... if you deal only with one XML doc... buffer reuse won't make a big = difference I think you might be interested in first investigating the = persistence feature in=20 2.0, and there is a directory under code examples... Cheers, Jimmy =20 ----- Original Message -----=20 From: Fernando Gonzalez=20 To: vtd...@li...=20 Sent: Tuesday, March 06, 2007 1:23 AM Subject: [Vtd-xml-users] Storing parsing info Hello, First of all I would like to congratulate you on your project, I = really think it's great. Second, I want to use the java VTD-XML to do a certain task and = I have succeeded but I don't know if I have done it in the right way, or = there is a better one. Can you give me some advice?=20 I want to evaluate some XPath expressions on a lot of files of = this size and larger, so the memory eficiency is critical. The first = idea that comes to my mind is to have a VTDGen object for each XML file = but this solution leads to having all the XMLs loaded in memory in the = "protected byte[] XMLDoc;" attribute in VTDGen class. So each time I = have to evaluate a XPath expression in a XML file I have to read the xml = file, parse it, evaluate XPath and set to null the VTDGen object to get = the memory freed by the garbage collector.=20 I have obtained these results reading a big XML file (~100Mb): 360 ms reading file 1890 ms parsing file 32 ms evaluating a XPath expression 93 ms showing results total =3D 2375 milliseconds Where the second step ("parsing file") means: VTDGen vg =3D new VTDGen(); vg.setDoc(b); vg.parse(true); To speed up the process I have stored the parsing information in = a file. After that I can read the XML file and the parsing information = file, evaluate the XPath expression and close everything again in a = shorter time:=20 344 ms reading the file 422 ms reading parsing information 125 ms evaluating a XPath expression 93 ms showing results total =3D 984 I think the result is good enough but maybe there's a better = solution than mine. I have stored the parsing info by serializing all = the VTDGen object but the XMLDoc attribute. Then I retrieve the object = from disk and I set the XMLDoc attribute. This way:=20 ObjectInputStream ois =3D new ObjectInputStream(new = FileInputStream(PARSING_INFO)); vg =3D (MyVTDGen) ois.readObject(); ois.close(); FileInputStream fis2 =3D new = FileInputStream(TEST_XML);=20 byte[] b2 =3D new byte[(int) f.length()]; fis2.read(b2); vg.setXML(b2); //This method only sets the XMLDoc = attribute Is this solution good? Is there a better one? Can "Buffer reuse" = solve my probem?=20 best regards, Fernando ------------------------------------------------------------------------ = -------------------------------------------------------------------------= Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance = to share your opinions on IT & business topics through brief surveys-and earn = cash = http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV=20 ------------------------------------------------------------------------ _______________________________________________ Vtd-xml-users mailing list Vtd...@li... https://lists.sourceforge.net/lists/listinfo/vtd-xml-users |
From: Jimmy Z. <cra...@co...> - 2007-03-07 03:51:56
|
Give me a few days to come up with a proposal/draft and I will send you guys URL... ----- Original Message ----- From: "Rodrigo Cunha" <rn...@gm...> To: "Mark Swanson" <ma...@Sc...>; "Jimmy Zhang" <cra...@co...> Cc: <vtd...@li...> Sent: Tuesday, March 06, 2007 8:24 AM Subject: Re: [Vtd-xml-users] Random Access Proposal (take 2) > Hum, thanks for the hint also Mark. I normally use java.util collections, > but i'll give fastutils a look, since they might be better for some of my > problems. > > Jimmy, a context can be imported/exported as an array of integers, this > being a bit raw perhaps... and leaving the programmer with the job of > storing that array. I just created SimpleContext as a way of abstracting > that implementation detail, but since you like performance, perhaps you > could leave that raw. I still prefer the abstracted version... besides, > the internal representation might change and the abstracted version would > still work. > > So, yes, go ahead with my API even if you decide to change it later. > > Just as a little silly footnote I found my API is also useful in cases > where you have: > > push(); > // do something > pop(); > push(); > // do something > pop(); > push(); > // do something > pop(); > push(); > // do something > pop(); > > Can be abreviated to: > > setCtxFromNav(ctx); > // do something > setNavFromCtx(ctx); > // do something > setNavFromCtx(ctx); > // do something > setNavFromCtx(ctx); > // do something > setNavFromCtx(ctx); > > This is both faster and more readable. > > It's also useful for things like: > > xpto = getComplexToFindTransmissionPath(VTDNav n, String s); // might even > use caching! > push(); > nav.setNavFromCtx(xpto); > // Do something > pop(); > > and also for exception handling: > > { > setCtxFromNav(xpto); > // do something complex and error-prone, > // so I won't use the stack, but a few locally declared contexts instead > // oops! crash! go to handler! > finally { > setNavFromCtx(xpto) > // we are now clean > } > > So, it made using VTD much more enjoyable to me, even in rather trivial > situations. > > Mark Swanson wrote: >> Jimmy Zhang wrote: >>> >>>> Just a FYI: I have cases where the key is an Integer, and cases where >>>> it's a string. >>> >>> By Integer is it a java class? or just a primitive data type? Maybe I >>> can >>> modify Rodrigo's class and put it into CVS so you guys can use >>> immediately... >>> however, I can't guarantee that it will be included in the next >>> release... >>> Would that work? >> >> Oh, I always use native ints and fastutil wherever possible. >> >> Just a thought: I use autojar on my code to build a tiny fastutil jar >> that just has the code I need. You could do the same thing to get >> excellent native collections instead of writing your own. I see you >> already wrote your own, but in case you need more.. Fastutil uses the >> LGPL. >> >> Cheers. >> > > |
From: Jimmy Z. <cra...@co...> - 2007-03-06 20:35:07
|
Hey Fernando, Thanks for the email.. I am glad VTD-XML is helpful. My question: Which version are you using? =20 If you are currently using 2.0, it contains the indexing feature that = might accomplish just what is described in your email. Your solution is to seperate XML from VTD and LC, which I think you must have added code to do that... VTD+XML (as in version 2.0) is to package XML, VTD and LCs into=20 a single file... which should also work The only suspicious part is that the XPath performance dropped for=20 your case ... which shouldn't happen=20 Buffer reuse is useful if your app instantiates a VTDGen to sequentially process many incoming XML document ... if you deal only with one XML doc... buffer reuse won't make a big = difference I think you might be interested in first investigating the persistence = feature in=20 2.0, and there is a directory under code examples... Cheers, Jimmy =20 ----- Original Message -----=20 From: Fernando Gonzalez=20 To: vtd...@li...=20 Sent: Tuesday, March 06, 2007 1:23 AM Subject: [Vtd-xml-users] Storing parsing info Hello, First of all I would like to congratulate you on your project, I = really think it's great. Second, I want to use the java VTD-XML to do a certain task and I have = succeeded but I don't know if I have done it in the right way, or there = is a better one. Can you give me some advice?=20 I want to evaluate some XPath expressions on a lot of files of this = size and larger, so the memory eficiency is critical. The first idea = that comes to my mind is to have a VTDGen object for each XML file but = this solution leads to having all the XMLs loaded in memory in the = "protected byte[] XMLDoc;" attribute in VTDGen class. So each time I = have to evaluate a XPath expression in a XML file I have to read the xml = file, parse it, evaluate XPath and set to null the VTDGen object to get = the memory freed by the garbage collector.=20 I have obtained these results reading a big XML file (~100Mb): 360 ms reading file 1890 ms parsing file 32 ms evaluating a XPath expression 93 ms showing results total =3D 2375 milliseconds Where the second step ("parsing file") means: VTDGen vg =3D new VTDGen(); vg.setDoc(b); vg.parse(true); To speed up the process I have stored the parsing information in a = file. After that I can read the XML file and the parsing information = file, evaluate the XPath expression and close everything again in a = shorter time:=20 344 ms reading the file 422 ms reading parsing information 125 ms evaluating a XPath expression 93 ms showing results total =3D 984 I think the result is good enough but maybe there's a better solution = than mine. I have stored the parsing info by serializing all the VTDGen = object but the XMLDoc attribute. Then I retrieve the object from disk = and I set the XMLDoc attribute. This way:=20 ObjectInputStream ois =3D new ObjectInputStream(new = FileInputStream(PARSING_INFO)); vg =3D (MyVTDGen) ois.readObject(); ois.close(); FileInputStream fis2 =3D new FileInputStream(TEST_XML);=20 byte[] b2 =3D new byte[(int) f.length()]; fis2.read(b2); vg.setXML(b2); //This method only sets the XMLDoc = attribute Is this solution good? Is there a better one? Can "Buffer reuse" solve = my probem?=20 best regards, Fernando -------------------------------------------------------------------------= ----- = -------------------------------------------------------------------------= Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to = share your opinions on IT & business topics through brief surveys-and earn cash = http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV -------------------------------------------------------------------------= ----- _______________________________________________ Vtd-xml-users mailing list Vtd...@li... https://lists.sourceforge.net/lists/listinfo/vtd-xml-users |
From: Rodrigo C. <rn...@gm...> - 2007-03-06 16:24:45
|
Hum, thanks for the hint also Mark. I normally use java.util collections, but i'll give fastutils a look, since they might be better for some of my problems. Jimmy, a context can be imported/exported as an array of integers, this being a bit raw perhaps... and leaving the programmer with the job of storing that array. I just created SimpleContext as a way of abstracting that implementation detail, but since you like performance, perhaps you could leave that raw. I still prefer the abstracted version... besides, the internal representation might change and the abstracted version would still work. So, yes, go ahead with my API even if you decide to change it later. Just as a little silly footnote I found my API is also useful in cases where you have: push(); // do something pop(); push(); // do something pop(); push(); // do something pop(); push(); // do something pop(); Can be abreviated to: setCtxFromNav(ctx); // do something setNavFromCtx(ctx); // do something setNavFromCtx(ctx); // do something setNavFromCtx(ctx); // do something setNavFromCtx(ctx); This is both faster and more readable. It's also useful for things like: xpto = getComplexToFindTransmissionPath(VTDNav n, String s); // might even use caching! push(); nav.setNavFromCtx(xpto); // Do something pop(); and also for exception handling: { setCtxFromNav(xpto); // do something complex and error-prone, // so I won't use the stack, but a few locally declared contexts instead // oops! crash! go to handler! finally { setNavFromCtx(xpto) // we are now clean } So, it made using VTD much more enjoyable to me, even in rather trivial situations. Mark Swanson wrote: > Jimmy Zhang wrote: >> >>> Just a FYI: I have cases where the key is an Integer, and cases where >>> it's a string. >> >> By Integer is it a java class? or just a primitive data type? Maybe I >> can >> modify Rodrigo's class and put it into CVS so you guys can use >> immediately... >> however, I can't guarantee that it will be included in the next >> release... >> Would that work? > > Oh, I always use native ints and fastutil wherever possible. > > Just a thought: I use autojar on my code to build a tiny fastutil jar > that just has the code I need. You could do the same thing to get > excellent native collections instead of writing your own. I see you > already wrote your own, but in case you need more.. Fastutil uses the > LGPL. > > Cheers. > |
From: Mark S. <ma...@Sc...> - 2007-03-06 15:44:23
|
Jimmy Zhang wrote: > >> Just a FYI: I have cases where the key is an Integer, and cases where >> it's a string. > > By Integer is it a java class? or just a primitive data type? Maybe I can > modify Rodrigo's class and put it into CVS so you guys can use > immediately... > however, I can't guarantee that it will be included in the next release... > Would that work? Oh, I always use native ints and fastutil wherever possible. Just a thought: I use autojar on my code to build a tiny fastutil jar that just has the code I need. You could do the same thing to get excellent native collections instead of writing your own. I see you already wrote your own, but in case you need more.. Fastutil uses the LGPL. Cheers. -- http://www.ScheduleWorld.com/ Free Google Calendar synchronization with Outlook, Evolution, cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird, Pocket PC/Windows Mobile. Also sync tasks, notes and contacts! WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support. |
From: Fernando G. <fer...@gm...> - 2007-03-06 09:24:15
|
Hello, First of all I would like to congratulate you on your project, I really think it's great. Second, I want to use the java VTD-XML to do a certain task and I have succeeded but I don't know if I have done it in the right way, or there is a better one. Can you give me some advice? I want to evaluate some XPath expressions on a lot of files of this size and larger, so the memory eficiency is critical. The first idea that comes to my mind is to have a VTDGen object for each XML file but this solution leads to having all the XMLs loaded in memory in the "protected byte[] XMLDoc;" attribute in VTDGen class. So each time I have to evaluate a XPath expression in a XML file I have to read the xml file, parse it, evaluate XPath and set to null the VTDGen object to get the memory freed by the garbage collector. I have obtained these results reading a big XML file (~100Mb): 360 ms reading file 1890 ms parsing file 32 ms evaluating a XPath expression 93 ms showing results total = 2375 milliseconds Where the second step ("parsing file") means: VTDGen vg = new VTDGen(); vg.setDoc(b); vg.parse(true); To speed up the process I have stored the parsing information in a file. After that I can read the XML file and the parsing information file, evaluate the XPath expression and close everything again in a shorter time: 344 ms reading the file 422 ms reading parsing information 125 ms evaluating a XPath expression 93 ms showing results total = 984 I think the result is good enough but maybe there's a better solution than mine. I have stored the parsing info by serializing all the VTDGen object but the XMLDoc attribute. Then I retrieve the object from disk and I set the XMLDoc attribute. This way: ObjectInputStream ois = new ObjectInputStream(new FileInputStream(PARSING_INFO)); vg = (MyVTDGen) ois.readObject(); ois.close(); FileInputStream fis2 = new FileInputStream(TEST_XML); byte[] b2 = new byte[(int) f.length()]; fis2.read(b2); vg.setXML(b2); //This method only sets the XMLDoc attribute Is this solution good? Is there a better one? Can "Buffer reuse" solve my probem? best regards, Fernando |
From: Jimmy Z. <cra...@co...> - 2007-03-06 04:45:01
|
> Just a FYI: I have cases where the key is an Integer, and cases where > it's a string. By Integer is it a java class? or just a primitive data type? Maybe I can modify Rodrigo's class and put it into CVS so you guys can use immediately... however, I can't guarantee that it will be included in the next release... Would that work? > >> If instead of keeping a context I can keep a simple integer and then >> order a VTDNav "hey you, get this integer you told me to keep and go to >> node you bookmarked" I would say it's ok, if the operation "get to the >> node" is fast. >> >> So, you're suggesting an API that would work like this: >> >> RandomNodeRecorder xpto = new RandomNodeRecorder(navigator); >> // xpto is the bookmark keeper organized in a way Jimmy likes :-) >> int mark = xpto.keepPos(); >> /* do some stuff here */ >> boolean xpto.fetchPos(mark); // back to the bookmarked node >> xpto.del(mark); // don't need the mark any longer >> >> I still fail to understand why shoudn't a context be kept outside the >> structures you seem to like :-) > > Well, I'd be interested in knowing the time/space trade offs for both. > For one specific case, I could have an int as the key, and an int as the > mark/vtd-node. Both ints could be native ints with fastutil. Maybe the CPU > overhead is much smaller with SimpleContext though... it would be nice to > see what Jimmy has in mind (the details). > >> Memory is cheap, and for example, if I keep a hash of NEs, and each NE >> occupies a few KB itself, it's irrelevant if I'm gona use a few more >> bytes for each NE. >> >> I'm not suggesting one should keep large structures containing any single >> node in the document, ok? But the random access to a cached node must be >> fast. I emphasize: fast random access to cached nodes. > > +1 > >> As far as I understand the SimpleContext structure grows 4 bytes for each >> depth level, so a deeper node consumes more space, right? So a really >> deep node, let's say, at level 10, will consume 40 extra bytes, plus the >> base consumption... that's 48 bytes, quite small, unless the node is >> small and irrelevant. > > It is small, but I'm looking at about 6k indexes to cache per document, > and as many documents cached as possible. Over-guessing at 100 bytes per > SimpleContext (total) would mean 600KB of SimpleContext objects per > document. I understand my use cases deal with larger than normal > documents, but that just means I have so much more to gain from random > access. > > Cheers. > > -- > http://www.ScheduleWorld.com/ > Free Google Calendar synchronization with Outlook, Evolution, > cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird, > Pocket PC/Windows Mobile. Also sync tasks, notes and contacts! > WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support. > |
From: Jimmy Z. <cra...@co...> - 2007-03-06 03:57:09
|
I see... in last email you asked why VTD-XML 1.0 contains the bookmark class that you described ... actually this question was very much part of the discussion early on around 1.0 release... The reason is that, if you care to dig into VTD-XML's XPath release.. you will find that the bookmark class is not necessary to implement the entire XPath 1.0 spec... that kinda shifted our decision towards not offering this feature... besides the design of the core VTD-XML API (especially methods within VTDNav) needs to be well-thought out as it is something that affects every VTD-XML user...Let me think about this a bit more on how to address this issue that you raised.. one way may be to implement this feature in an added-on module... ----- Original Message ----- From: "Rodrigo Cunha" <rn...@gm...> To: "Jimmy Zhang" <cra...@co...> Cc: <vtd...@li...> Sent: Monday, March 05, 2007 3:39 PM Subject: Re: [Vtd-xml-users] Random Access Proposal (take 2) >I keep them in a HashMap, for example, or in a TreeMap, etc... rarely on > a simple list. > > The key is generally a string I would need to get in more or less > convoluted ways from the node during a sequencial search. The node > itself contains a lot more info I only want to retrieve in the future if > it's needed, or else I would cache the info itself :-D > > If instead of keeping a context I can keep a simple integer and then > order a VTDNav "hey you, get this integer you told me to keep and go to > node you bookmarked" I would say it's ok, if the operation "get to the > node" is fast. > > So, you're suggesting an API that would work like this: > > RandomNodeRecorder xpto = new RandomNodeRecorder(navigator); > // xpto is the bookmark keeper organized in a way Jimmy likes :-) > int mark = xpto.keepPos(); > /* do some stuff here */ > boolean xpto.fetchPos(mark); // back to the bookmarked node > xpto.del(mark); // don't need the mark any longer > > I still fail to understand why shoudn't a context be kept outside the > structures you seem to like :-) > > Memory is cheap, and for example, if I keep a hash of NEs, and each NE > occupies a few KB itself, it's irrelevant if I'm gona use a few more > bytes for each NE. > > I'm not suggesting one should keep large structures containing any > single node in the document, ok? But the random access to a cached node > must be fast. I emphasize: fast random access to cached nodes. > > As far as I understand the SimpleContext structure grows 4 bytes for > each depth level, so a deeper node consumes more space, right? So a > really deep node, let's say, at level 10, will consume 40 extra bytes, > plus the base consumption... that's 48 bytes, quite small, unless the > node is small and irrelevant. > > -- > Rodrigo > > Jimmy Zhang wrote: >> Rodrigo, in one of the early emails you mentioned that >> your app sometimes keeps thousands of those nodes (or context), >> how do you tell them apart? There got to be some data >> structures to store those context info, right? >> What if it is implemented such that >> 1. those contexts/nodes are stored in a linear buffer, which >> is more space efficient than allocating individual objects >> 2. You can address those nodes by using an integer, sort >> like a VTD record... >> would you consider the design outlined above can be useful >> to your app? >> >> Jimmy > |
From: Mark S. <ma...@Sc...> - 2007-03-06 03:04:19
|
Rodrigo Cunha wrote: > I keep them in a HashMap, for example, or in a TreeMap, etc... rarely on > a simple list. > The key is generally a string I would need to get in more or less > convoluted ways from the node during a sequencial search. The node > itself contains a lot more info I only want to retrieve in the future if > it's needed, or else I would cache the info itself :-D Just a FYI: I have cases where the key is an Integer, and cases where it's a string. > If instead of keeping a context I can keep a simple integer and then > order a VTDNav "hey you, get this integer you told me to keep and go to > node you bookmarked" I would say it's ok, if the operation "get to the > node" is fast. > > So, you're suggesting an API that would work like this: > > RandomNodeRecorder xpto = new RandomNodeRecorder(navigator); > // xpto is the bookmark keeper organized in a way Jimmy likes :-) > int mark = xpto.keepPos(); > /* do some stuff here */ > boolean xpto.fetchPos(mark); // back to the bookmarked node > xpto.del(mark); // don't need the mark any longer > > I still fail to understand why shoudn't a context be kept outside the > structures you seem to like :-) Well, I'd be interested in knowing the time/space trade offs for both. For one specific case, I could have an int as the key, and an int as the mark/vtd-node. Both ints could be native ints with fastutil. Maybe the CPU overhead is much smaller with SimpleContext though... it would be nice to see what Jimmy has in mind (the details). > Memory is cheap, and for example, if I keep a hash of NEs, and each NE > occupies a few KB itself, it's irrelevant if I'm gona use a few more > bytes for each NE. > > I'm not suggesting one should keep large structures containing any > single node in the document, ok? But the random access to a cached node > must be fast. I emphasize: fast random access to cached nodes. +1 > As far as I understand the SimpleContext structure grows 4 bytes for > each depth level, so a deeper node consumes more space, right? So a > really deep node, let's say, at level 10, will consume 40 extra bytes, > plus the base consumption... that's 48 bytes, quite small, unless the > node is small and irrelevant. It is small, but I'm looking at about 6k indexes to cache per document, and as many documents cached as possible. Over-guessing at 100 bytes per SimpleContext (total) would mean 600KB of SimpleContext objects per document. I understand my use cases deal with larger than normal documents, but that just means I have so much more to gain from random access. Cheers. -- http://www.ScheduleWorld.com/ Free Google Calendar synchronization with Outlook, Evolution, cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird, Pocket PC/Windows Mobile. Also sync tasks, notes and contacts! WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support. |
From: Rodrigo C. <rn...@gm...> - 2007-03-05 23:39:32
|
I keep them in a HashMap, for example, or in a TreeMap, etc... rarely on a simple list. The key is generally a string I would need to get in more or less convoluted ways from the node during a sequencial search. The node itself contains a lot more info I only want to retrieve in the future if it's needed, or else I would cache the info itself :-D If instead of keeping a context I can keep a simple integer and then order a VTDNav "hey you, get this integer you told me to keep and go to node you bookmarked" I would say it's ok, if the operation "get to the node" is fast. So, you're suggesting an API that would work like this: RandomNodeRecorder xpto = new RandomNodeRecorder(navigator); // xpto is the bookmark keeper organized in a way Jimmy likes :-) int mark = xpto.keepPos(); /* do some stuff here */ boolean xpto.fetchPos(mark); // back to the bookmarked node xpto.del(mark); // don't need the mark any longer I still fail to understand why shoudn't a context be kept outside the structures you seem to like :-) Memory is cheap, and for example, if I keep a hash of NEs, and each NE occupies a few KB itself, it's irrelevant if I'm gona use a few more bytes for each NE. I'm not suggesting one should keep large structures containing any single node in the document, ok? But the random access to a cached node must be fast. I emphasize: fast random access to cached nodes. As far as I understand the SimpleContext structure grows 4 bytes for each depth level, so a deeper node consumes more space, right? So a really deep node, let's say, at level 10, will consume 40 extra bytes, plus the base consumption... that's 48 bytes, quite small, unless the node is small and irrelevant. -- Rodrigo Jimmy Zhang wrote: > Rodrigo, in one of the early emails you mentioned that > your app sometimes keeps thousands of those nodes (or context), > how do you tell them apart? There got to be some data > structures to store those context info, right? > What if it is implemented such that > 1. those contexts/nodes are stored in a linear buffer, which > is more space efficient than allocating individual objects > 2. You can address those nodes by using an integer, sort > like a VTD record... > would you consider the design outlined above can be useful > to your app? > > Jimmy |
From: Jimmy Z. <cra...@co...> - 2007-03-05 18:43:36
|
Rodrigo, in one of the early emails you mentioned that your app sometimes keeps thousands of those nodes (or context), how do you tell them apart? There got to be some data structures to store those context info, right? What if it is implemented such that 1. those contexts/nodes are stored in a linear buffer, which is more space efficient than allocating individual objects 2. You can address those nodes by using an integer, sort like a VTD record... would you consider the design outlined above can be useful to your app? Jimmy ----- Original Message ----- From: "Rodrigo Cunha" <rn...@gm...> To: <vtd...@li...> Sent: Monday, March 05, 2007 3:49 AM Subject: Re: [Vtd-xml-users] Random Access Proposal (take 2) >I think we are already getting out of the inicial context, so let's do > reset :-) > > Let's get back to basics: the API, as it exists now, is unusable for > what I want. And I don't want that much either, just to be able to keep > nodes in my own data structures in an easy and efficient way to access. > I actually solved the problem since ximpleware-1.6, I shared the patch > then, I'm sharing it now again, since it seems the problem is recurrent > with others also. > > I'm not saying this is the correct implementation, I'm saying this is > the correct API, or close to that. The implementation is not that bad > either, but perhaps can be improved. As it is it's working just fine, > and is so obvious I'm still puzzled at why wasn't it part of ximpleware > since version 1.0. > > Remember, the objective is "being able to keep a bunch of nodes in data > structures in an easy and efficient way to access", with no fuzz > attached. Just something like "give me node x bookmark", "take this > bookmark and go back there". > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > |
From: Rodrigo C. <rn...@gm...> - 2007-03-05 11:49:24
|
I think we are already getting out of the inicial context, so let's do reset :-) Let's get back to basics: the API, as it exists now, is unusable for what I want. And I don't want that much either, just to be able to keep nodes in my own data structures in an easy and efficient way to access. I actually solved the problem since ximpleware-1.6, I shared the patch then, I'm sharing it now again, since it seems the problem is recurrent with others also. I'm not saying this is the correct implementation, I'm saying this is the correct API, or close to that. The implementation is not that bad either, but perhaps can be improved. As it is it's working just fine, and is so obvious I'm still puzzled at why wasn't it part of ximpleware since version 1.0. Remember, the objective is "being able to keep a bunch of nodes in data structures in an easy and efficient way to access", with no fuzz attached. Just something like "give me node x bookmark", "take this bookmark and go back there". |
From: Tatu S. <cow...@ya...> - 2007-03-05 05:45:29
|
--- Rodrigo Cunha <rn...@gm...> wrote: > Yes, all that is correct, and that's why I told you > we should have > choices. The API should provide choices and > flexibility. Decent > programmers will use the right tool, not the wrong > tool. Documentation > should also help whenever possible. Actually, sometimes choice is good; oftentimes too many (and specifically, irrelevant) choices just confuse, and few developers use them. Although I used to think most things should be configurable, I have started to question that -- unless _measured_ performance impact is significant, it may be best to just choose reasonable defaults. In the end, vast majority of users/devs just use whatever tools default to: and those who do not, generally prefer limited set of actually relevant things to configure. Or at the very least, make things configurable when they are requested to be configurable, and not try to speculate wildly about what might be useful. Anyway, just my opinions based on other projects, -+ Tatu +- ____________________________________________________________________________________ Need a quick answer? Get one in minutes from people who know. Ask your question on www.Answers.yahoo.com |
From: Mark S. <ma...@Sc...> - 2007-03-05 03:45:09
|
Rodrigo Cunha wrote: > Yes, all that is correct, and that's why I told you we should have > choices. The API should provide choices and flexibility. Decent > programmers will use the right tool, not the wrong tool. Documentation > should also help whenever possible. > > BTW Mark, does my API solve your random-access problem?... It looks like it will. I'm in scramble / crunch time atm with the new Thunderbird/Lightning extension and the best I can do is participate via a few emails. Sorry I can't do more with this fascinating stuff atm. Cheers. -- http://www.ScheduleWorld.com/ Free Google Calendar synchronization with Outlook, Evolution, cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird, Pocket PC/Windows Mobile. Also sync tasks, notes and contacts! WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support. |
From: Rodrigo C. <rn...@gm...> - 2007-03-05 03:00:00
|
Yes, all that is correct, and that's why I told you we should have choices. The API should provide choices and flexibility. Decent programmers will use the right tool, not the wrong tool. Documentation should also help whenever possible. BTW Mark, does my API solve your random-access problem?... Jimmy Zhang wrote: > Totally! DRAM access time is 100~200 cycles, packing everything together > will make cache miss far less often and boost both parsing and > navigation perfomrnace! > ----- Original Message ----- From: "Mark Swanson" > <ma...@Sc...> > To: "Rodrigo Cunha" <rn...@gm...> > Cc: "Jimmy Zhang" <cra...@co...>; > <vtd...@li...> > Sent: Sunday, March 04, 2007 10:35 AM > Subject: Re: [Vtd-xml-users] Random Access Proposal (take 2) > > >>> Memory usage is not allways critical, sometimes performance can be >>> gained from memory, sometimes that's a false argument. In this case an >> >> Just my 2cents: significant performance increases can be gained by >> structuring your algorithms to keep your memory accesses within a >> column of memory as much as possible. DRAM cycle time penalties are >> huge - unless you can keep the data in the CPU's cache. >> This is one of those tricks that column databases like K and Vertica >> use to stomp traditional databases with for n-dimensional queries. I >> actually wrote an n-dimensional column-based database for a company a >> number of years ago. >> >> Cheers. >> >> -- >> http://www.ScheduleWorld.com/ >> Free Google Calendar synchronization with Outlook, Evolution, >> cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird, >> Pocket PC/Windows Mobile. Also sync tasks, notes and contacts! >> WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support. >> > > > |
From: Jimmy Z. <cra...@co...> - 2007-03-04 19:04:29
|
Totally! DRAM access time is 100~200 cycles, packing everything together will make cache miss far less often and boost both parsing and navigation perfomrnace! ----- Original Message ----- From: "Mark Swanson" <ma...@Sc...> To: "Rodrigo Cunha" <rn...@gm...> Cc: "Jimmy Zhang" <cra...@co...>; <vtd...@li...> Sent: Sunday, March 04, 2007 10:35 AM Subject: Re: [Vtd-xml-users] Random Access Proposal (take 2) >> Memory usage is not allways critical, sometimes performance can be gained >> from memory, sometimes that's a false argument. In this case an > > Just my 2cents: significant performance increases can be gained by > structuring your algorithms to keep your memory accesses within a column > of memory as much as possible. DRAM cycle time penalties are huge - unless > you can keep the data in the CPU's cache. > This is one of those tricks that column databases like K and Vertica use > to stomp traditional databases with for n-dimensional queries. I actually > wrote an n-dimensional column-based database for a company a number of > years ago. > > Cheers. > > -- > http://www.ScheduleWorld.com/ > Free Google Calendar synchronization with Outlook, Evolution, > cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird, > Pocket PC/Windows Mobile. Also sync tasks, notes and contacts! > WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support. > |
From: Mark S. <ma...@Sc...> - 2007-03-04 18:35:47
|
> Memory usage is not allways critical, sometimes performance can be > gained from memory, sometimes that's a false argument. In this case an Just my 2cents: significant performance increases can be gained by structuring your algorithms to keep your memory accesses within a column of memory as much as possible. DRAM cycle time penalties are huge - unless you can keep the data in the CPU's cache. This is one of those tricks that column databases like K and Vertica use to stomp traditional databases with for n-dimensional queries. I actually wrote an n-dimensional column-based database for a company a number of years ago. Cheers. -- http://www.ScheduleWorld.com/ Free Google Calendar synchronization with Outlook, Evolution, cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird, Pocket PC/Windows Mobile. Also sync tasks, notes and contacts! WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support. |