Content Extraction: Umlaut

Christoph
2008-12-11
2013-05-13
  • Christoph

    Christoph - 2008-12-11

    I tried to extract content from pdf and word docs. Unfortunately, German Umlauts were not extracted.

    Example Word Doc:
    ... Zugehörigkeit ...

    Content Extraction:
    Zugeh
    rigkeit

     
    • Christoph

      Christoph - 2008-12-11

      I used filecrawler.sh and my system uses en_US.ISO-8859-15.

       
    • Christoph

      Christoph - 2008-12-11

      Same on de_DE.UTF-8.

       
      • Antoni Mylka

        Antoni Mylka - 2008-12-11

        Please manufacture a small file that exhibits this problem, create a bug on the tracker and add the file to the bug report.

         

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks