[PDFBox-user] java.io.IOException: Unknown encoding for ...
Brought to you by:
benlitchfield
From: Dmitry G. <DGo...@at...> - 2008-03-20 17:27:37
|
Hi, I've encountered these errors when dealing with Japanese PDF documents (stack trace below). I was wondering if any work is underway to enable PDFBox to deal with encodings such as 90ms-RKSJ-H and GBK-EUC-H. From looking at the latest code, the currently supported set only seems to include COSName.MAC_ROMAN_ENCODING COSName.PDF_DOC_ENCODING COSName.STANDARD_ENCODING COSName.WIN_ANSI_ENCODING (by looking at the EncodingManager code). I'd appreciate any info. Thanks - Dmitry The stack trace: java.io.IOException: Unknown encoding for '90ms-RKSJ-H' at org.pdfbox.encoding.EncodingManager.getEncoding(EncodingManager.java:82) at org.pdfbox.pdmodel.font.PDFont.getEncoding(PDFont.java:586) at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:459) at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:343) at org.pdfbox.util.operator.ShowText.process(ShowText.java:64) at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:497) at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:218) at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:177) at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:339) at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:263) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:219) at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:152) |