Re: [pdftohtml] Trouble wirh Unicode Cyrillic...

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Vencislav,
To be honest, I've never tried cyrillic support with pdftohtml. I think, 
however, that it should be possible to make it work without any code 
changes.
First of all, did you try specifying UTF-8 output encoding with the -enc 
option? It just might work. If that doesn't work, maybe try downloading
cyrillic language package for xpdf, you can get it here: 
http://www.foolabs.com/xpdf/download.html
Unfortunately I don't completely understand yet how encoding handling 
works in xpdf.
Also if you could send me a sample file, I'd be able to play with it 
myself.

> Hallo,
> I try to compile PdfToHtml source files in Microsoft Visual C++.
> Successfuly! I start pdftohtml.exe and convert for example small.pdf
> file to html files.
> But in currently developed version (0.34) I do not understand fully
> Unicode Table maintanance... I have trouble with Cyrrilic fonts &
> character set.
> There is header file UnicodeMapTables.h including in project.
> I have one question: What is this format :
> static UnicodeMapRange latin1UnicodeMapRanges[] = {
>   { 0x000a, 0x000a, 0x5a, 1 },
>   { 0x000c, 0x000d, 0x5a, 1 },
> & e.t.c.
> Can I change this array, to support Cyrillic Unicode instead of Latin1
> unicode? What is that format - value from first column, value from
> second , thitd and fourth column column?
> Venci Dimov
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Pdftohtml-general mailing list
> Pdf...@li...
> https://lists.sourceforge.net/lists/listinfo/pdftohtml-general
>