#20 PdfReal use of float data type causes loss of precision

closed-fixed
None
5
2011-11-01
2011-09-15
Keith Martin
No

Because the float data type is not exact, when reading in a PDF file, making modifications, and then saving the PDF file, the PdfReal values are sometimes changed.

This can cause the modified PDF file to show forms in the wrong location when BBox and Matrix values are used.
The following BBox and Matrix values have caused me problems:
<<
/Type /XObject
/Subtype /Form
/FormType 1
/Matrix [0.00168 0 0 0.001188 0 0]
/Name /Form4
/Filter /FlateDecode
/Length 65 0 R
/BBox [0 0 595.320007 841.919983]
/Resources 58 0 R
>>

I have atteched the Patch File that covers my fix to this issue.

Discussion

  • PDF Reference 1.7, Appendix C (Implementation Limits) declares that

    "To represent real numbers, Acrobat 6 uses IEEE single-precision floating-point numbers, as described in the IEEE Standard for Binary Floating-Point Arithmetic (see the Bibliography). Previous versions used 32-bit fixed-point numbers (16 bits on either side of the radix point), which have greater precision but a much smaller range than IEEE floating-point numbers. (Acrobat 6 still converts floating-point numbers to fixed point for some components, such as screen display and fonts.)"

    Considering the above citation, using decimal type is just overkilling; IMO, a good compromise (which anyway exceeds the 32-bit single-precision floating-point numbers used by Acrobat) is double-precision floating-point numbers.
    On serialization, as the number of decimal places may be influenced by user requirements, I would prefer to define a decimal-place parameter or even a callback function to let users get what they actually expect.

    This implementation will be included in 0.1.1 version, which is due in a few weeks.

     
    • assigned_to: nobody --> stechio
     
    • status: open --> closed-fixed
     
  • Fixed on trunk (revision 48) of SVN repo [1]; it will be part of 0.1.1 release.
    Real numbers are allocated as double-precision floating point numbers; their serialization is controlled by a format parameter (see File.getConfiguration().get/setRealFormat()).

    [1] https://sourceforge.net/scm/?type=svn&group_id=176158