Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#3 Some useful functions for jPod: TIFF, JPEG, lossless, unicode TTF writing

Unstable (example)
open
nobody
None
5
2014-08-16
2014-06-05
Antti S. Lankila
No

I spent a day switching away from pdfbox to jPod and adapted this library for my use case. I'm dumping this code over the fence without much support and I'm hoping that it is useful.

Provided are:

  • Multipage TIFF reader where CCITT streams are embedded into PDImages without decoding the TIFF other than at the tag level. This allows using jPod to implement something like tiff2pdf.

  • JPEG reader where JPEG is embedded into PDImage without decoding it (other than looking up its width/height).

  • Lossless image support from any BufferedImage, for e.g. PNG. I cut all corners and just made it RGB image with full 8-bit alpha rather than trying to optimize it. I was pressed for time and I mostly deal with 32-bit ARGB images.

  • Very crude but nevertheless functioning unicode-encoded TTF loader that permits writing international text with jPod. I can't guarantee it's bug-free or anything, obviously it leaves much to be desired. The most important problem is its missing ToUnicode cmap.

While doing this work, I was impressed by the overall cleanliness and orderliness of the code.

However, there are a few details that should be changed when it comes to unicode text handling. Firstly, it's not correct to prefer char[] as the unicode string representation because of surrogate pairs and code point values above 65536. The appropriate approach is to use String's method codePointAt(n) and then increment n by Character.charCount(codePoint). I think the optimal representation is a String instance. There is a sample of this approach in the anonymous inline Encoding class.

Secondly, I had to use reflection to bang the cachedEncoding into PDFontType0. Trying to call setEncoding() resulted in a CMapEncoding being used rather than my own minimal encoder. I suspect that this issue has been fixed in the library release 5.6.0, but for some reason this version is not available in maven repository.

1 Attachments

Discussion

  • I was able to add the cmap generation too.

    Something like this:
    StringBuilder cmapContents = new StringBuilder();
    cmapContents.append("/CIDInit /ProcSet findresource begin\n");
    cmapContents.append("12 dict begin\n");
    cmapContents.append("begincmap\n");
    cmapContents.append("/CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def\n");
    cmapContents.append(String.format("/CMapName /%s def\n", cmapName));
    cmapContents.append("/CMapType 2 def\n");
    cmapContents.append("1 begincodespacerange\n");
    cmapContents.append("<0000> <ffff>\n");
    cmapContents.append("endcodespacerange\n");

    / Reverse the map from unicode -> glyph to glyph -> unicode. /
    Map<Object, List<Map.Entry<?,="" ?="">>> map = unicodeMap.entrySet().stream().collect(Collectors.groupingBy(x -> x.getValue()));
    cmapContents.append(map.size() + " beginbfchar\n");
    for (Map.Entry<Object, List<Map.Entry<?,="" ?="">>> entry : map.entrySet()) {
    int glyphId = (Integer) entry.getKey();
    cmapContents.append(String.format("<%04x> <", glyphId));
    / What to do about multiple mappings? We could crash or something... /
    int codePoint = (Integer) entry.getValue().get(0).getKey();
    byte[] value = new String(Character.toChars(codePoint)).getBytes(StandardCharsets.UTF_16BE);
    for (byte b : value) {
    cmapContents.append(String.format("%02x", b & 0xff));
    }
    cmapContents.append(">\n");
    }
    cmapContents.append("endbfchar\n");
    cmapContents.append("endcmap\n");
    cmapContents.append("CMapName currentdict /CMap defineresource pop\n");
    cmapContents.append("end\n");
    cmapContents.append("end\n");
    COSStream cmapObject = COSStream.create(null);
    cmapObject.setDecodedBytes(cmapContents.toString().getBytes(StandardCharsets.UTF_8));
    pdFont0.setToUnicode((CMap) InternalCMap.META.createFromCos(cmapObject));

     
  • mtraut
    mtraut
    2014-06-10

    Thank you for your support.

    I hope we can come back to your submission with our next release. To enable us to do so, please decorate your code/posting clearly with lesser BSD license agreement.

    Currently we are very busy, so please forgive any delay.

    Maven repository is not supported by us. We currently only provide code as a file download on this site. The maven submission is managed by a third party supporter.

    Regards, Michael

     
    • mtraut mtraut@users.sf.net kirjoitti 10.6.2014 kello 10.24:

      Thank you for your support.

      I hope we can come back to your submission with our next release. To enable us to do so, please decorate your code/posting clearly with lesser BSD license agreement.

      OK, I can do this. I’ve made some tweaks and correctness fixes that need to be included, mostly two things: TIFF PDImage must have setBitsPerComponent(1) called, or image will be rejected by Acrobat Reader; and font must have a getFlags().setSymbolic(false) called because of the way these flags work.
      Maven repository is not supported by us. We currently only provide code as a file download on this site. The maven submission is managed by a third party supporter.

      Yeah, I guess I’ll try to have to figure out who this is and contact him.

      Antti

       
  • OK, this is the new version. I added the 3-clause BSD license (which is what I assume you mean by "lesser") as a javadoc for the class.

     
  • ... and excuse the name. This was supposed to be called jpodimprovements but I was just studying recent changes in pdfbox when I prepared the file.