SourceForge has been redesigned. Learn more.

html output for doc files

  • quique

    quique - 2003-11-19

    we finally made wvWare work under RedHat 8.  Our problem is that html output contains some strange characters instead of original ones (mainly spanish characters such as , etc.) and when we want Apache (Apache2) to serve the page, IE shows them wrongly. We've tried changing charset to iso-8859 and utf8 unsuccessfully.
    Any kind of help would be gratefull
    Thanks in advance

    • quique

      quique - 2003-11-25

      All right, 'unicode' charset works OK, but we want to write the output to a file instead of returning it to php.

      If we try 'wvWare -c unicode > mypage.html' output file is OK but mypage.html is a BINARY file!!! How can we change this behavour? We can use wvHtml, which generates a text html file output, but then no charset modifiers are allowed.

      Thanks anyway

      • Francis James Franklin

        I don't understand. Is wvHtml not generating valid UTF-8? Or it is but IE can't render it anyway?

        Does the --charset= option to wvHtml work OK? (Though even if it does, is there a wrong charset declaration inside the output file?)

        • quique

          quique - 2003-11-26

          it seems to generate a valid UTF-8 but IE doesn't render it correctly untill I don't change the 'View-->Coding' option to UTF-8 (should't it do that automatically?).
          In other way, the --charset option for wvHtml is not in the manual (just for wvWare). I've tried it, getting a binary file as well. The most strange thing is that "binary characters" are only present in contents, not in tags like in this example (edited whith the 'vi' command:

          <p style="text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; ">
          ^@P^@r^@o^@c^@e^@d^@i^@m^@i^@e^@n^@t^@o^@ ^@t^@^@c^@n^@i^@c^@o^@ ^@d^@e^@ ^@s^@e^@g^@u^@r^@i^@d^@a^@d^@ ^@q^@u^@e^@ ^@d^@e^@f^@i^@n^@e^@ ^@l^@a^@ ^@e^@x^@p^@e^@d^@i^@c^@i^@^@n^@ ^@d^@e^@ ^@c^@e^@r^@t^@i^@f^@i^@c^@a^@d^@o^@s

          when IE renders this, it seems to ignore binary characters and page displays correctly (except for the charset problem), but when I have problems when parsing the file so as to make some changes in it.

          • Francis James Franklin

            Yes, because of the way wvWare creates HTML (by reading an XML config file) it doesn't alter the encoding of the <p style="... etc. mark-up. (Hmm.) So you should choose encodings which extend ASCII.

            The other problem is likely to be the document charset encoding declaration in the HTML output - I doubt this changes, if it is even there at all. Does the output have anything like:

            <META http-equiv="Content-Type" content="text/html; charset=UTF-8">

            near the top?

    • quique

      quique - 2003-11-26

      Yes, the output has that line, and yor suggest has helped me:
      after a couple of tests I made it work with iso-8859-1 encoding ('wvHtml --charset=iso-8859-1 myWord.doc myPage.html') getting a iso-8859-1 text output file, and I've got these conclusions:

      1. The binary output is only produced using the --charset=unicode modifier.
      2. As you said, the --charset=UTF-8 seems to produce a text UTF-8 file but IE doesn't recognize it correctly and spanish characters are wrong displayed.

      we'll go on with iso encoding. thanks for your advice


Log in to post a comment.