Re: [Vtd-xml-users] Random Access Proposal (take 2)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

VTD-XML certainly still has a lot of growing up left...
One old observation from early days of VTD-XML:
Memory usage has strong performance implications as well...
DOM's excessive memory usage directly contributes its slow 
performance, VTD-XML's memory strategy is largely responsible 
for its parsing performance...so VTD's mentality has always been to: 
reduce memory usage whenever possible, that will, one way
or another, lead to better performance... 

as setPosition(int i)'s performance, did your implemenation directly
manipulate Location Cache tables? I will take a stab at it to see
how fast it can get 

Concering the equal comparison of two node object, I think that
the only condition to check is the the currentIndex value of tthe
cursor location, assuming the same VTDNav instance...
hashCode computation and equality can basically use that value...

----- Original Message ----- 
From: "Rodrigo Cunha" <rn...@gm...>
To: "Jimmy Zhang" <cra...@co...>
Cc: <vtd...@li...>
Sent: Friday, March 02, 2007 4:46 PM
Subject: Re: [Vtd-xml-users] Random Access Proposal (take 2)

> Hi Jimmy!
> 
> Well, the objective is to hold objects in java data structures for fast 
> access, so I opted for a simple class, with minimal computation 
> requirements. You can see even the equals() ans hashCode() are 
> optimized. It holds a single node because that's the basic building 
> block. All else should be done with container classes, I think: let's 
> not reinvent the wheel, and pretend to do it better.
> 
> All this brings VTD a much needed DOM-like functionality. I can now use 
> VTD as I used DOM before. In fact that's what I've been doing since a 
> few months ago, with my own patched ximpleware-1.6, and I'm now sharing 
> this functionality in a more polished way with you all.
> 
> The code is basically the same as for push and pop, that's right. The 
> wheel was there, worked just fine, I took it and reused it in my own way 
> :-) I gave a simple example because all real code I've produced using 
> this heavily is both confidential and proprietary... so I can't really 
> show that, but believe me, the speed improvements are huge sometimes.
> 
> Memory is cheap and unlimited for most realistic scenarios where this 
> library might be used, processor time is limited. I for one have several 
> GB available, and try to optimize for speed. Sometimes I even disable GC 
> and periodically kill VMs, since that has considerable overall speed 
> gains in some code I developed. Still a SimpleContext can be reused, 
> that that's a good advise if you care both about space and GC.
> 
> I actually made that method you mentioned to recover the position from a 
> single integer. It was kinda slow, to say the least, even with rather 
> optimized search. That was my first try: been there, done that :-) I 
> dumped the code somewhere as it was useless in practice.
> 
> I think SimpleContext, as it is implemented in the last mail I sent, is 
> the best option for random access as I see it, or at least is strongly 
> pointing in the right direction.
> 
> Quite frankly I think all 3 options are excelent, and all should be part 
> of ximpleware-2.1. They are all excelent and different tools, each with 
> a set of critical advantages over others:
> 
> - pop/push
> - NodeRecorder
> - SimpleContext
> 
> Concerning the VTDNav.setPosition(int nodeNumber) I think it's useless, 
> since it's way too slow for heavy usage. But it eventually should be 
> there, just in case, perhaps with some performance-wise warnings.
> 
> I hope this helps improve your wonderfull tool, Jimmy. VTD can really 
> deliver, if you can make it a bit more open and flexible. You can in 
> fact aspire to kill DOM in the future. Perhaps in the future an entire 
> DOM-API could be emulated and implemented on top of VTD, dinamically 
> creating the required objects. For now this is only a dream, of course, 
> but who knows?
> 
> Just my 2 euro-cents :-)
> 
> Jimmy Zhang wrote:
>> Rodrigo, I went over your emails on this thread again and comes up a 
>> few questions...
>>
>> 1. In one of the emails, you attached a class capable of storing a 
>> single node position..
>> I am wondering why store just one? why not more?
>>
>> 2.  In the code below, vn.setCtxFromNav and vn.setNavFromCtx seem to me
>> equivalent to push() and pop
>>
>> myContext = new SimpleContext(null); //Or another size you like, but
>> null works just fine
>> while(ap.iterate()){
>>    vn.setCtxFromNav(myContext);
>>    // do something messy
>>    vn.setNavFromCtx(myContext);
>> }
>>
>>
>> As to NodeRecorder, it is designed to be instantiated once and hold 
>> many nodes...
>> not to be instantiated 10 times for 10 nodes...
>>
>> As to why NodeRecorder doesn't use the node format as in push pop and 
>> setCtx as
>> in your patch... the main motivation is to conserve memory, push pop 
>> and setCtx all
>> use full-expanded node representation which can be quite big for a 
>> complex
>> document
>>
>> NodeRecorde's internal format is more compact, but the the 
>> representation is variable
>> in length...
>>
>> The most compact node representation is to just use the index value 
>> which is always
>> 32-bit per node... VTDNav can add a method that "recovers" the node 
>> position from a
>> single index value of the node...
>>
>> All in all, the above three options are typical trade-offs between 
>> memory and computation
>>
>> * Pop push use the full  expanded node representation which are 
>> constant in length and don't
>> require and extra computation
>>
>> * Node recorder's internal node representation is compacted a bit, but 
>> is variable in length,
>> and therefore can be accessed only sequentially
>>
>> * Using a single integer to represent a node is the most compact, but 
>> requires some CPU cyles
>> to "recover" the node position...
>>
>> I would like ot know what you think of those options...and will take 
>> the discussion
>> forward from that point on...
> 
>