From: Jimmy Z. <cra...@co...> - 2007-03-03 21:00:41
|
VTD-XML certainly still has a lot of growing up left... One old observation from early days of VTD-XML: Memory usage has strong performance implications as well... DOM's excessive memory usage directly contributes its slow performance, VTD-XML's memory strategy is largely responsible for its parsing performance...so VTD's mentality has always been to: reduce memory usage whenever possible, that will, one way or another, lead to better performance... as setPosition(int i)'s performance, did your implemenation directly manipulate Location Cache tables? I will take a stab at it to see how fast it can get Concering the equal comparison of two node object, I think that the only condition to check is the the currentIndex value of tthe cursor location, assuming the same VTDNav instance... hashCode computation and equality can basically use that value... ----- Original Message ----- From: "Rodrigo Cunha" <rn...@gm...> To: "Jimmy Zhang" <cra...@co...> Cc: <vtd...@li...> Sent: Friday, March 02, 2007 4:46 PM Subject: Re: [Vtd-xml-users] Random Access Proposal (take 2) > Hi Jimmy! > > Well, the objective is to hold objects in java data structures for fast > access, so I opted for a simple class, with minimal computation > requirements. You can see even the equals() ans hashCode() are > optimized. It holds a single node because that's the basic building > block. All else should be done with container classes, I think: let's > not reinvent the wheel, and pretend to do it better. > > All this brings VTD a much needed DOM-like functionality. I can now use > VTD as I used DOM before. In fact that's what I've been doing since a > few months ago, with my own patched ximpleware-1.6, and I'm now sharing > this functionality in a more polished way with you all. > > The code is basically the same as for push and pop, that's right. The > wheel was there, worked just fine, I took it and reused it in my own way > :-) I gave a simple example because all real code I've produced using > this heavily is both confidential and proprietary... so I can't really > show that, but believe me, the speed improvements are huge sometimes. > > Memory is cheap and unlimited for most realistic scenarios where this > library might be used, processor time is limited. I for one have several > GB available, and try to optimize for speed. Sometimes I even disable GC > and periodically kill VMs, since that has considerable overall speed > gains in some code I developed. Still a SimpleContext can be reused, > that that's a good advise if you care both about space and GC. > > I actually made that method you mentioned to recover the position from a > single integer. It was kinda slow, to say the least, even with rather > optimized search. That was my first try: been there, done that :-) I > dumped the code somewhere as it was useless in practice. > > I think SimpleContext, as it is implemented in the last mail I sent, is > the best option for random access as I see it, or at least is strongly > pointing in the right direction. > > Quite frankly I think all 3 options are excelent, and all should be part > of ximpleware-2.1. They are all excelent and different tools, each with > a set of critical advantages over others: > > - pop/push > - NodeRecorder > - SimpleContext > > Concerning the VTDNav.setPosition(int nodeNumber) I think it's useless, > since it's way too slow for heavy usage. But it eventually should be > there, just in case, perhaps with some performance-wise warnings. > > I hope this helps improve your wonderfull tool, Jimmy. VTD can really > deliver, if you can make it a bit more open and flexible. You can in > fact aspire to kill DOM in the future. Perhaps in the future an entire > DOM-API could be emulated and implemented on top of VTD, dinamically > creating the required objects. For now this is only a dream, of course, > but who knows? > > Just my 2 euro-cents :-) > > Jimmy Zhang wrote: >> Rodrigo, I went over your emails on this thread again and comes up a >> few questions... >> >> 1. In one of the emails, you attached a class capable of storing a >> single node position.. >> I am wondering why store just one? why not more? >> >> 2. In the code below, vn.setCtxFromNav and vn.setNavFromCtx seem to me >> equivalent to push() and pop >> >> myContext = new SimpleContext(null); //Or another size you like, but >> null works just fine >> while(ap.iterate()){ >> vn.setCtxFromNav(myContext); >> // do something messy >> vn.setNavFromCtx(myContext); >> } >> >> >> As to NodeRecorder, it is designed to be instantiated once and hold >> many nodes... >> not to be instantiated 10 times for 10 nodes... >> >> As to why NodeRecorder doesn't use the node format as in push pop and >> setCtx as >> in your patch... the main motivation is to conserve memory, push pop >> and setCtx all >> use full-expanded node representation which can be quite big for a >> complex >> document >> >> NodeRecorde's internal format is more compact, but the the >> representation is variable >> in length... >> >> The most compact node representation is to just use the index value >> which is always >> 32-bit per node... VTDNav can add a method that "recovers" the node >> position from a >> single index value of the node... >> >> All in all, the above three options are typical trade-offs between >> memory and computation >> >> * Pop push use the full expanded node representation which are >> constant in length and don't >> require and extra computation >> >> * Node recorder's internal node representation is compacted a bit, but >> is variable in length, >> and therefore can be accessed only sequentially >> >> * Using a single integer to represent a node is the most compact, but >> requires some CPU cyles >> to "recover" the node position... >> >> I would like ot know what you think of those options...and will take >> the discussion >> forward from that point on... > > |