From: Tatu S. <cow...@ya...> - 2006-10-23 18:14:49
|
--- Paul Tomsic <pt...@ya...> wrote: > well, by caching i mean in-memory. > guess that's my question. is it better to parse > once, > then store the xml as a stringBuffer or something? No, it's probably never good idea to use StringBuffer (or builder) -- it will not have a parsed representation, just a kind of serialization. There are generally 3 steps in parsing: (a) Reading the data (from file, network) (b) Decoding (byte->char) (c) Tokenization (char[] -> events, or in case of vtd-xml, int offsets + type) If you want to cache things in-memory, the most compact presentation would be a raw byte array. That'd get rid of (a). If you use StringBuffer (or char[] etc), you get rid of (b), but most likely double memory usage. Byte[] can contain utf-8 encoded contents, which are generally quite compact -- Strings (buffers, builders) consists of 16-bit character. VTD would still have to tokenize (and read through) the whole contents, and being about twice the size of byte representation, would possibly be slower than reading from a byte array (or even disk, depending on i/o speed). However; if you can afford to cache these docs in memory, you can probably afford to just cache the resulting VTD document object. After all, it adds only about 50% overhead; but it's fully parsed and all, and thus efficient to access. Finally, for these kinds of tasks, an actual native XML database might make most sense. They store xml content efficiently in structured (not serialized) form. There are a few open source ones available (like eXist). -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |