"'cm' not allowed" warning when parsing PDF

  • WitkOO

    We encountered "'cm' not allowed" problem during parsing PDF. The pdf contains a "cm" operation inside of text block(after begin text). Could you please help us by explaining what is the reason for following check?

    protected void render_cm(CSOperation operation) {
        // if there's an "open" path simply keep it
        if (frame.graphicsObjectState != PageLevel
                && frame.graphicsObjectState != PathObject) {
            throw new CSWarning("'cm' not allowed");
        float a = ((COSNumber) operation.getOperand(0)).floatValue();
        float b = ((COSNumber) operation.getOperand(1)).floatValue();
        float c = ((COSNumber) operation.getOperand(2)).floatValue();
        float d = ((COSNumber) operation.getOperand(3)).floatValue();
        float e = ((COSNumber) operation.getOperand(4)).floatValue();
        float f = ((COSNumber) operation.getOperand(5)).floatValue();
        device.transform(a, b, c, d, e, f);

    If we commented it out the file was processed more or less correctly, if its there, the Y positions of glyphs is always the same(the cm makes the new line). We didnt encounter this method of making new lines in different pdfs.
    We would like to end up wit a conclusion whether it is an invalid PDF, jPod problem or jPod feature.
    Thank you for assistance.

  • mtraut

    As always, in case of doubt, it is a feature :-/

    PDF Ref 1.7 c 4.1 states that special graphics operators are not valid in text objects. That said, any viewer (and as we see, any client) is free to do what he thinks its best with such garbage.


    A content stream whose operations violate these rules for describing graphics objects can produce unpredictable behavior, even though it may display and print correctly. Applications that attempt to extract graphics objects for editing or other purposes depend on the objects’ being well formed. The rules for graphics objects are also important for the proper interpretation of marked content...

    The correct way to position text in text objects are the respective T<x> commands, e.g. Tm.

    So you're free to interpret "cm" here e.g. as Tm if you want to...

    I see it would be better to provide a hook method here to handle spec deviations - this would give you the chance to simply ignore... lets get this on the todo list.

    Last edit: mtraut 2013-10-04