From: Rodrigo C. <rn...@gm...> - 2007-03-03 00:46:57
|
Hi Jimmy! Well, the objective is to hold objects in java data structures for fast access, so I opted for a simple class, with minimal computation requirements. You can see even the equals() ans hashCode() are optimized. It holds a single node because that's the basic building block. All else should be done with container classes, I think: let's not reinvent the wheel, and pretend to do it better. All this brings VTD a much needed DOM-like functionality. I can now use VTD as I used DOM before. In fact that's what I've been doing since a few months ago, with my own patched ximpleware-1.6, and I'm now sharing this functionality in a more polished way with you all. The code is basically the same as for push and pop, that's right. The wheel was there, worked just fine, I took it and reused it in my own way :-) I gave a simple example because all real code I've produced using this heavily is both confidential and proprietary... so I can't really show that, but believe me, the speed improvements are huge sometimes. Memory is cheap and unlimited for most realistic scenarios where this library might be used, processor time is limited. I for one have several GB available, and try to optimize for speed. Sometimes I even disable GC and periodically kill VMs, since that has considerable overall speed gains in some code I developed. Still a SimpleContext can be reused, that that's a good advise if you care both about space and GC. I actually made that method you mentioned to recover the position from a single integer. It was kinda slow, to say the least, even with rather optimized search. That was my first try: been there, done that :-) I dumped the code somewhere as it was useless in practice. I think SimpleContext, as it is implemented in the last mail I sent, is the best option for random access as I see it, or at least is strongly pointing in the right direction. Quite frankly I think all 3 options are excelent, and all should be part of ximpleware-2.1. They are all excelent and different tools, each with a set of critical advantages over others: - pop/push - NodeRecorder - SimpleContext Concerning the VTDNav.setPosition(int nodeNumber) I think it's useless, since it's way too slow for heavy usage. But it eventually should be there, just in case, perhaps with some performance-wise warnings. I hope this helps improve your wonderfull tool, Jimmy. VTD can really deliver, if you can make it a bit more open and flexible. You can in fact aspire to kill DOM in the future. Perhaps in the future an entire DOM-API could be emulated and implemented on top of VTD, dinamically creating the required objects. For now this is only a dream, of course, but who knows? Just my 2 euro-cents :-) Jimmy Zhang wrote: > Rodrigo, I went over your emails on this thread again and comes up a > few questions... > > 1. In one of the emails, you attached a class capable of storing a > single node position.. > I am wondering why store just one? why not more? > > 2. In the code below, vn.setCtxFromNav and vn.setNavFromCtx seem to me > equivalent to push() and pop > > myContext = new SimpleContext(null); //Or another size you like, but > null works just fine > while(ap.iterate()){ > vn.setCtxFromNav(myContext); > // do something messy > vn.setNavFromCtx(myContext); > } > > > As to NodeRecorder, it is designed to be instantiated once and hold > many nodes... > not to be instantiated 10 times for 10 nodes... > > As to why NodeRecorder doesn't use the node format as in push pop and > setCtx as > in your patch... the main motivation is to conserve memory, push pop > and setCtx all > use full-expanded node representation which can be quite big for a > complex > document > > NodeRecorde's internal format is more compact, but the the > representation is variable > in length... > > The most compact node representation is to just use the index value > which is always > 32-bit per node... VTDNav can add a method that "recovers" the node > position from a > single index value of the node... > > All in all, the above three options are typical trade-offs between > memory and computation > > * Pop push use the full expanded node representation which are > constant in length and don't > require and extra computation > > * Node recorder's internal node representation is compacted a bit, but > is variable in length, > and therefore can be accessed only sequentially > > * Using a single integer to represent a node is the most compact, but > requires some CPU cyles > to "recover" the node position... > > I would like ot know what you think of those options...and will take > the discussion > forward from that point on... |