Re: [Vtd-xml-users] Random Access Proposal (take 2)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Rodrigo Cunha wrote:
> I keep them in a HashMap, for example, or in a TreeMap, etc... rarely on 
> a simple list.

> The key is generally a string I would need to get in more or less 
> convoluted ways from the node during a sequencial search. The node 
> itself contains a lot more info I only want to retrieve in the future if 
> it's needed, or else I would cache the info itself  :-D

Just a FYI: I have cases where the key is an Integer, and cases where 
it's a string.

> If instead of keeping a context I can keep a simple integer and then 
> order a VTDNav "hey you, get this integer you told me to keep and go to 
> node you bookmarked" I would say it's ok, if the operation "get to the 
> node" is fast.
> 
> So, you're suggesting an API that would work like this:
> 
> RandomNodeRecorder xpto = new RandomNodeRecorder(navigator);
> // xpto is the bookmark keeper organized in a way Jimmy likes :-)
> int mark = xpto.keepPos();
> /* do some stuff here */
> boolean xpto.fetchPos(mark); // back to the bookmarked node
> xpto.del(mark); // don't need the mark any longer
> 
> I still fail to understand why shoudn't a context be kept outside the 
> structures you seem to like :-)

Well, I'd be interested in knowing the time/space trade offs for both.
For one specific case, I could have an int as the key, and an int as the 
mark/vtd-node. Both ints could be native ints with fastutil. Maybe the 
CPU overhead is much smaller with SimpleContext though... it would be 
nice to see what Jimmy has in mind (the details).

> Memory is cheap, and for example, if I keep a hash of NEs, and each NE 
> occupies a few KB itself, it's irrelevant if I'm gona use a few more 
> bytes for each NE.
> 
> I'm not suggesting one should keep large structures containing any 
> single node in the document, ok? But the random access to a cached node 
> must be fast. I emphasize: fast random access to cached nodes.

+1

> As far as I understand the SimpleContext structure grows 4 bytes for 
> each depth level, so a deeper node consumes more space, right? So a 
> really deep node, let's say, at level 10, will consume 40 extra bytes, 
> plus the base consumption... that's 48 bytes, quite small, unless the 
> node is small and irrelevant.

It is small, but I'm looking at about 6k indexes to cache per document, 
and as many documents cached as possible. Over-guessing at 100 bytes per 
SimpleContext (total) would mean 600KB of SimpleContext objects per 
document. I understand my use cases deal with larger than normal 
documents, but that just means I have so much more to gain from random 
access.

Cheers.

-- 
http://www.ScheduleWorld.com/
Free Google Calendar synchronization with Outlook, Evolution,
cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird,
Pocket PC/Windows Mobile. Also sync tasks, notes and contacts!
WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support.