From: Tatu S. <cow...@ya...> - 2006-11-01 20:41:57
|
--- Jimmy Zhang <cra...@co...> wrote: > Tatu, when detecting XML encoding types, which are > currrently supported > by Woodstox?? Examples below: > > encoding="cp-1250" > encoding="cp1250" > encoding="windows-1250" > encoding="win-1250" > > It seems there can be many different types of > encoding values... Yes. Woodstox offers 2 levels of encoding support: first one is "native" support, in which Woodstox actually has its own decoder implementations, and then fallback handling that uses JDK-provided decoders. Set of natively supported decoders is quite small: just ISO-Latin1 (8859-1), Ascii (7-bit), UTF-8, UTF-32 and UTF-16 (last one with partial native support). But I do think JDK actually supports ones you listed; about the only encoding I know that will not work is IBM's EBCDIC, and I haven't received requests for supporting any additional ones. So I assume either people don't use other encodings, or JDK has support. One page that's been useful when figuring out how encodings are to work has been this one: http://www.iana.org/assignments/character-sets which is linked to from xml specification itself. Hope this helps! -+ Tatu +- > Jimmy > ----- Original Message ----- > From: "Tatu Saloranta" <cow...@ya...> > To: <vtd...@li...> > Sent: Monday, October 23, 2006 10:14 AM > Subject: Re: [Vtd-xml-users] design question/best > practice > > > > --- Paul Tomsic <pt...@ya...> wrote: > > > >> well, by caching i mean in-memory. > >> guess that's my question. is it better to parse > >> once, > >> then store the xml as a stringBuffer or > something? > > > > No, it's probably never good idea to use > StringBuffer > > (or builder) -- it will not have a parsed > > representation, just a kind of serialization. > There > > are generally 3 steps in parsing: > > > > (a) Reading the data (from file, network) > > (b) Decoding (byte->char) > > (c) Tokenization (char[] -> events, or in case of > > vtd-xml, int offsets + type) > > > > If you want to cache things in-memory, the most > > compact presentation would be a raw byte array. > That'd > > get rid of (a). If you use StringBuffer (or char[] > > etc), you get rid of (b), but most likely double > > memory usage. Byte[] can contain utf-8 encoded > > contents, which are generally quite compact -- > Strings > > (buffers, builders) consists of 16-bit character. > VTD > > would still have to tokenize (and read through) > the > > whole contents, and being about twice the size of > byte > > representation, would possibly be slower than > reading > > from a byte array (or even disk, depending on i/o > > speed). > > > > However; if you can afford to cache these docs in > > memory, you can probably afford to just cache the > > resulting VTD document object. > > After all, it adds only about 50% overhead; but > it's > > fully parsed and all, and thus efficient to > access. > > > > Finally, for these kinds of tasks, an actual > native > > XML database might make most sense. They store xml > > content efficiently in structured (not serialized) > > form. There are a few open source ones available > (like > > eXist). > > > > -+ Tatu +- > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > > ------------------------------------------------------------------------- > > Using Tomcat but need to do more? Need to support > web services, security? > > Get stuff done quickly with pre-integrated > technology to make your job > > easier > > Download IBM WebSphere Application Server v.1.0.1 > based on Apache Geronimo > > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Vtd-xml-users mailing list > > Vtd...@li... > > > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > > > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support > web services, security? > Get stuff done quickly with pre-integrated > technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 > based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users > ____________________________________________________________________________________ We have the perfect Group for you. Check out the handy changes to Yahoo! Groups (http://groups.yahoo.com) |