Re: [pdftohtml] Trouble wirh Unicode Cyrillic...
Status: Beta
Brought to you by:
meshko
|
From: Derek B. N. <de...@fo...> - 2002-07-16 00:29:15
|
>> You definitely don't want to modify the tables in the source code -- >> it's much easier to add external encoding files. As Mikhail said, you >> could try using the UTF-8 encoding, or try downloading the Cyrillic >> support package, and then selecting the KOI8-R encoding. > > provided that Bolgarians use KOI8-R :) I doubt that, because Bolgarian > most likely has a couple of extra characters. E.g. Ukrainian has special > encoding KOI8-U etc. Ok, in that case you could either use UTF-8, or create a new encoding by constructing a .unicodeMap file, maybe starting with the KOI8-R file if it's close. > Unless I broke something, it should work exactly like pdftotext. I believe you also need to specify the encoding in the HTML header someplace. - Derek |