[Xmlppm-users] bug with accented caracters in xmlppm
Status: Beta
Brought to you by:
jcheney
From: Vincent R. <vi...@st...> - 2003-01-17 08:59:09
|
Hello, I've just given xmlppm a try. A just ran into a little problem. My sample file is a docbook/XML file (French version of Jules Verne's "De la terre à la lune"). After trying successively bzip2, gzip & xmlppm, here are the final file sizes. -rw-r--r-- 1 root root 414211 Jan 16 18:02 yo.xml -rw-r--r-- 1 root root 97412 Jan 16 18:03 yo.xml.bz2 -rw-r--r-- 1 root root 132325 Jan 16 18:03 yo.xml.gz -rw-r--r-- 1 root root 91940 Jan 16 18:03 yo.xml.xmlppm So far so good: xmlppm achieved the highest compression ratio (5.6% better than bzip2, really not bad at all!). Now comes the bad part : when I uncompress the file, all the HTML entities are messed up. For example, the french accented letters (coded in my HTML file by 'é', 'è', etc) are not decoded correctly. If the accents are 'iso-8859-1' encoded, I get the same result. NB: I've attached a small xml sample (the 1st chapter of the book actually) that also triggers this problem, I've on purpose mixed both encodings for accents. I'm somewhat frustrated, because your tool shows great promisses, but the fact it messes up accents makes it useless for me now. Cordialement, -- Vincent RENARDIAS Directeur Technique StrongHoldNET / http://www.strongholdnet.com |