From: Jimmy Z. <cra...@co...> - 2006-11-01 18:18:29
|
Tatu, when detecting XML encoding types, which are currrently supported by Woodstox?? Examples below: encoding="cp-1250" encoding="cp1250" encoding="windows-1250" encoding="win-1250" It seems there can be many different types of encoding values... Jimmy ----- Original Message ----- From: "Tatu Saloranta" <cow...@ya...> To: <vtd...@li...> Sent: Monday, October 23, 2006 10:14 AM Subject: Re: [Vtd-xml-users] design question/best practice > --- Paul Tomsic <pt...@ya...> wrote: > >> well, by caching i mean in-memory. >> guess that's my question. is it better to parse >> once, >> then store the xml as a stringBuffer or something? > > No, it's probably never good idea to use StringBuffer > (or builder) -- it will not have a parsed > representation, just a kind of serialization. There > are generally 3 steps in parsing: > > (a) Reading the data (from file, network) > (b) Decoding (byte->char) > (c) Tokenization (char[] -> events, or in case of > vtd-xml, int offsets + type) > > If you want to cache things in-memory, the most > compact presentation would be a raw byte array. That'd > get rid of (a). If you use StringBuffer (or char[] > etc), you get rid of (b), but most likely double > memory usage. Byte[] can contain utf-8 encoded > contents, which are generally quite compact -- Strings > (buffers, builders) consists of 16-bit character. VTD > would still have to tokenize (and read through) the > whole contents, and being about twice the size of byte > representation, would possibly be slower than reading > from a byte array (or even disk, depending on i/o > speed). > > However; if you can afford to cache these docs in > memory, you can probably afford to just cache the > resulting VTD document object. > After all, it adds only about 50% overhead; but it's > fully parsed and all, and thus efficient to access. > > Finally, for these kinds of tasks, an actual native > XML database might make most sense. They store xml > content efficiently in structured (not serialized) > form. There are a few open source ones available (like > eXist). > > -+ Tatu +- > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > |