Error while running TextHighlightSample.java with the attached pdf
Exception in thread "main" java.lang.NumberFormatException: For input string: "0D280D4D0D26"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:495)
at org.pdfclown.documents.contents.fonts.CMapParser.parseUnicode(CMapParser.java:204)
at org.pdfclown.documents.contents.fonts.CMapParser.parse(CMapParser.java:112)
at org.pdfclown.documents.contents.fonts.Font.load(Font.java:734)
at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
at org.pdfclown.documents.contents.fonts.CompositeFont.<init>(CompositeFont.java:123)
at org.pdfclown.documents.contents.fonts.Type2Font.<init>(Type2Font.java:57)
at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:261)
at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:40)
at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:811)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:777)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:765)
at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.get(ContentScanner.java:690)
at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.access$500(ContentScanner.java:679)
at org.pdfclown.documents.contents.ContentScanner.getCurrentWrapper(ContentScanner.java:1154)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:632)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:647)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
at org.pdfclown.samples.cli.TextHighlightSample.run(TextHighlightSample.java:59)
at org.pdfclown.samples.cli.TextHighlightSample.main(TextHighlightSample.java:33)</init></init></init></init></init>
The sample file contains a composite font (JKANVA+Liya) whose character-to-unicode map (ToUnicode CMap) points to some Unicode multi-character sequences like this Malayalam combination:
PDF Clown (0.1.2) is currently limited to Unicode single-character sequences -- to fix.
Last edit: Stefano Chizzolini 2015-04-28