Re: [Vtd-xml-users] design question/best practice

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

--- Paul Tomsic <pt...@ya...> wrote:

> well, by caching i mean in-memory.
> guess that's my question.  is it better to parse
> once,
> then store the xml as a stringBuffer or something?

No, it's probably never good idea to use StringBuffer
(or builder) -- it will not have a parsed
representation, just a kind of serialization. There
are generally 3 steps in parsing:

(a) Reading the data (from file, network)
(b) Decoding (byte->char)
(c) Tokenization (char[] -> events, or in case of
vtd-xml, int offsets + type)

If you want to cache things in-memory, the most
compact presentation would be a raw byte array. That'd
get rid of (a). If you use StringBuffer (or char[]
etc), you get rid of (b), but most likely double
memory usage. Byte[] can contain utf-8 encoded
contents, which are generally quite compact -- Strings
(buffers, builders) consists of 16-bit character. VTD
would still have to tokenize (and read through) the
whole contents, and being about twice the size of byte
representation, would possibly be slower than reading
from a byte array (or even disk, depending on i/o
speed).

However; if you can afford to cache these docs in
memory, you can probably afford to just cache the
resulting VTD document object.
After all, it adds only about 50% overhead; but it's
fully parsed and all, and thus efficient to access.

Finally, for these kinds of tasks, an actual native
XML database might make most sense. They store xml
content efficiently in structured (not serialized)
form. There are a few open source ones available (like
eXist).

-+ Tatu +-

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com