Re: [Wvware-users] .doc -> .tex, .dvi, .ps conversions fail

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Mon, Oct 30, 2000 at 01:34:24PM -0500, Dom Lachowicz wrote:
> X-Originating-IP: [158.130.22.61]
> From: "Dom Lachowicz" <ci...@ho...>
> To: pd...@ho..., wvw...@li...
> Cc: mar...@hu...
> Subject: Re: [Wvware-users] .doc -> .tex, .dvi, .ps conversions fail
> Date: Mon, 30 Oct 2000 13:34:24 EST
> X-OriginalArrivalTime: 30 Oct 2000 18:34:24.0799 (UTC) FILETIME=[071E22F0:01C042A0]
> 
> This is a known bug that I have to fix. DVI and PS exhibit this problem 
> because they are passed to a filter/converter from the LaTeX output. Martin, 
> LaTeX is your specialty. Any ideas?
> 
> Dom
> 
> 
> >From: Peter Denisevich <pd...@ho...>
> >To: wvw...@li...
> >Subject: [Wvware-users] .doc -> .tex, .dvi, .ps conversions fail
> >Date: Mon, 30 Oct 2000 10:28:17 -0800
> >
> >I must be doing something stupid, but ...
> >
> >Using wv061 rpm or wv062 tarball and
> >trying to convert a word2000 or word95 document to ps or LaTex or dvi
> >all give me the same result:
> >each letter of the text is preceded by the [I assume] unicode number.
> >For example, for the word document reading:
> >"I am an idiot."
> >
> >$ wvLatex idiot95.doc idiot95.tex
> >wvError: (./wvConfig.c:3357) junk after document element at line 1
> >  wvError: (./wvConfig.c:3357) junk after document element at line 1
> >
> >
> >and gives the following tex file:
> ><snip>
> >\setlength{\rightskip}{0.00mm}
> >\raggedright
> >[4900]I[2000] [6100]a[6d00]m[2000] [6100]a[6e00]n[2000]
> >[6900]i[6400]d[6900]i[6f00]o[7400]t[2e00].[d00]
> >\vspace{0.00mm}
> ><snip>
> >
> >the mangled text line passes thru Latex and dvips unchanged to give
> >postscript which contains "[4900]I[2000][6100]..."
> >
> >
> >This is all on a RedHat 7.0 system

                    ^^^^^^^^^^

Can you say "byte order"? Can you say "iconv"? 

The problem and the possible solution is in line 1626 or thereabouts 
(my local file, differs slightly from CVS):

        /* Debugging aid: */
        if (char16 >= 0x80) printf("[%x]", char16);
        return(0);
        }

In the HTML part, the corresponding printf has been commented out.

Apparently the U16 char16 has a reversed byte order on RH7.0.

What to do? Is there a way to "know" on this place in the code what
the byte order is? One possibility is to change it to

        if ((char16 >= 0x80) && (char16&0x00ff) ) printf("[%x]", char16);

or words to that effect (i.e. printf only if char16 is bigger than
0x80, but on condition that the lower byte is nonzero too).

There is another thing that greatly worries me. the LaTeX and HTML
conversion routines contain hundreds of tests on char16. Apparently
these don't work under RH7.0 and all these beautiful special handlings
of codes just drop through.

What should we do? Just specify wvware for RH version < 7.0? I am
tempted ;-) Heaven knows what more there is hidden behind this
problem. Unicode/iconv gurus, HELP!

...

> >Thanks
> >Peter.

...

Martin
-- 
Martin Vermeer  mar...@hu...
Helsinki University of Technology 
Department of Surveying
P.O. Box 1200, FIN-02015 HUT, Finland
:wq