Thread: [Xmlppm-users] bug with accented caracters in xmlppm

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I've just given xmlppm a try. A just ran into a little problem.
My sample file is a docbook/XML file (French version of Jules Verne's
"De la terre à la lune").

After trying successively bzip2, gzip & xmlppm, here are the final file
sizes.

-rw-r--r--    1 root     root       414211 Jan 16 18:02 yo.xml
-rw-r--r--    1 root     root        97412 Jan 16 18:03 yo.xml.bz2
-rw-r--r--    1 root     root       132325 Jan 16 18:03 yo.xml.gz
-rw-r--r--    1 root     root        91940 Jan 16 18:03 yo.xml.xmlppm

So far so good: xmlppm achieved the highest compression ratio (5.6%
better than bzip2, really not bad at all!).

Now comes the bad part : when I uncompress the file, all the HTML
entities are messed up. For example, the french accented letters (coded
in my HTML file by '&#233;', '&#232;', etc) are not decoded correctly.
If the accents are 'iso-8859-1' encoded, I get the same result.

NB: I've attached a small xml sample (the 1st chapter of the book
actually) that also triggers this problem, I've on purpose mixed both
encodings for accents.

I'm somewhat frustrated, because your tool shows great promisses, but
the fact it messes up accents makes it useless for me now.

	Cordialement,

-- 
Vincent RENARDIAS
Directeur Technique
StrongHoldNET / http://www.strongholdnet.com

Thread: [Xmlppm-users] bug with accented caracters in xmlppm

xmlppm-users