Problem with diacritic chars in DjVu Text

  • Marcin Werla

    Marcin Werla - 2005-04-22


    I have another problem - when I extract text from DjVu file sI have a problem with diacritic (country specific) characters. It looks that these characters are encoded in two bytes (UTF?) but while extracting the text these two bytes are treated as two separates characters. Is there a possibility to define charset/text encoding that will used during text extraction?


    • Dr Bill C Riemers

      The DjVu specification requires the text layer to be encoded in UTF8. Unfortunately, this specification is not always followed. In those cases the text is returned one byte per character. If you know what locale the text is written for you can convert it with the Java API by copying it to a ByteStream and reading it back as the respective encoding.



Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks