we finally made wvWare work under RedHat 8. Our problem is that html output contains some strange characters instead of original ones (mainly spanish characters such as , etc.) and when we want Apache (Apache2) to serve the page, IE shows them wrongly. We've tried changing charset to iso-8859 and utf8 unsuccessfully.
Any kind of help would be gratefull
Thanks in advance
All right, 'unicode' charset works OK, but we want to write the output to a file instead of returning it to php.
If we try 'wvWare -c unicode > mypage.html' output file is OK but mypage.html is a BINARY file!!! How can we change this behavour? We can use wvHtml, which generates a text html file output, but then no charset modifiers are allowed.
Francis James Franklin
I don't understand. Is wvHtml not generating valid UTF-8? Or it is but IE can't render it anyway?
Does the --charset= option to wvHtml work OK? (Though even if it does, is there a wrong charset declaration inside the output file?)
it seems to generate a valid UTF-8 but IE doesn't render it correctly untill I don't change the 'View-->Coding' option to UTF-8 (should't it do that automatically?).
In other way, the --charset option for wvHtml is not in the manual (just for wvWare). I've tried it, getting a binary file as well. The most strange thing is that "binary characters" are only present in contents, not in tags like in this example (edited whith the 'vi' command:
<p style="text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; ">
^@P^@r^@o^@c^@e^@d^@i^@m^@i^@e^@n^@t^@o^@ ^@t^@^@c^@n^@i^@c^@o^@ ^@d^@e^@ ^@s^@e^@g^@u^@r^@i^@d^@a^@d^@ ^@q^@u^@e^@ ^@d^@e^@f^@i^@n^@e^@ ^@l^@a^@ ^@e^@x^@p^@e^@d^@i^@c^@i^@^@n^@ ^@d^@e^@ ^@c^@e^@r^@t^@i^@f^@i^@c^@a^@d^@o^@s
when IE renders this, it seems to ignore binary characters and page displays correctly (except for the charset problem), but when I have problems when parsing the file so as to make some changes in it.
Francis James Franklin
Yes, because of the way wvWare creates HTML (by reading an XML config file) it doesn't alter the encoding of the <p style="... etc. mark-up. (Hmm.) So you should choose encodings which extend ASCII.
The other problem is likely to be the document charset encoding declaration in the HTML output - I doubt this changes, if it is even there at all. Does the output have anything like:
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
near the top?
Yes, the output has that line, and yor suggest has helped me:
after a couple of tests I made it work with iso-8859-1 encoding ('wvHtml --charset=iso-8859-1 myWord.doc myPage.html') getting a iso-8859-1 text output file, and I've got these conclusions:
1. The binary output is only produced using the --charset=unicode modifier.
2. As you said, the --charset=UTF-8 seems to produce a text UTF-8 file but IE doesn't recognize it correctly and spanish characters are wrong displayed.
we'll go on with iso encoding. thanks for your advice