Content Extraction: Umlaut

  • Christoph

    Christoph - 2008-12-11

    I tried to extract content from pdf and word docs. Unfortunately, German Umlauts were not extracted.

    Example Word Doc:
    ... Zugehörigkeit ...

    Content Extraction:

    • Christoph

      Christoph - 2008-12-11

      I used and my system uses en_US.ISO-8859-15.

    • Christoph

      Christoph - 2008-12-11

      Same on de_DE.UTF-8.

      • Antoni Mylka

        Antoni Mylka - 2008-12-11

        Please manufacture a small file that exhibits this problem, create a bug on the tracker and add the file to the bug report.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks