Menu

#32 Fixed Page rotation

open
nobody
None
5
2008-05-29
2008-05-29
No

Hi all,

Daniel asked me for my patch for the rotation-issue described in https://sourceforge.net/forum/message.php?msg_id=4992032

Attention, I didn't apply the newest patches to the classes PDFStreamEngine and PageDrawer.

There are 4 more probably affected classes calling the page.findRotation method which I didn't change, because I'm didn't have to use them (until now).

org.pdfbox.util.operator.pagedrawer.Invoke
org.pdfbox.util.TextPositionComparator
org.pdfbox.examples.pdmodel.PrintURLs
org.pdfbox.examples.util.PrintImageLocations

I've attached a pdf in DINA4-landscape. The text is missplaced whenever I try to print or display (using the pdfbox-PDFReader and convertToImage within my application) it with pdfbox. The acrobat reader has no problems with my documents.
After my patch everything works fine. Perhaps it is a point of discussion, if the convertToImage method has to rotate the image or if the user has to do it. The PDFPagePanel didn't do it (yet).

Andreas

Discussion

  • Andreas Lehmkühler

    rotation_patch incl. testpdf

     
  • Daniel Wilson

    Daniel Wilson - 2008-05-29

    Logged In: YES
    user_id=1737686
    Originator: NO

    I've just tried your sample PDF w/ the latest code -- prior to application of your patch. It doesn't work.

    I'll work on incorporating your change for a full regression test in the next hour or so.

     
  • Andreas Lehmkühler

    Logged In: YES
    user_id=2069622
    Originator: YES

    Hi Daniel,

    I've just added my patch to the newest sources you send me earlier this day. I guess it works. During testing I've found another problem concernign graphics within landscape-docs. I found the solution in patching the class org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the others. And consequently to be strict I've also patched the new methods in org.pdfbox.pdfviewer.PageDrawer

    For my everthings works fine inlc. the 4PP-pdf.

    I've attached the patched files and another testpdf with a embedded graphic.

    Andreas
    File Added: pdfbox_rotation_patch_2.zip

     
  • Andreas Lehmkühler

    rotation-patch 2 incl. new testpdf

     
  • Daniel Wilson

    Daniel Wilson - 2008-05-29

    Logged In: YES
    user_id=1737686
    Originator: NO

    Your code works w/ the 4PP test ... and with the other rendering stuff I've tried so far.

    However ... the text extraction test fails with it. I can't figure that one out ... ideas?

     
  • Andreas Lehmkühler

    Logged In: YES
    user_id=2069622
    Originator: YES

    Can you give me some more details? I never do any textextractions with pdfbox. Perhaps you'll provide with the code for test program, or is it part of pdfbox, so that I can find it in the cvs?

    However, it has to wait until tomorrow

     
  • Daniel Wilson

    Daniel Wilson - 2008-05-29

    Logged In: YES
    user_id=1737686
    Originator: NO

    If you've got the whole project set up, try
    ant testextract

    I'll see if I can narrow it down some.

     
  • Daniel Wilson

    Daniel Wilson - 2008-05-29

    Logged In: YES
    user_id=1737686
    Originator: NO

    The extraction problem seems to have to do w/ the changes to PDFStreamEngine.

    If I revert that file, extraction succeeds. Unfortunately ... with that reverted but your other changes in place, image rendering hangs.

    Will work on it more ... probably tomorrow.

     
  • Daniel Wilson

    Daniel Wilson - 2008-05-29

    Logged In: YES
    user_id=1737686
    Originator: NO

    Correction ... it doesn't hang ... it's just slow on the first PDF to render ... maybe just due to the first one I'm sending it.

    Will look more tomorrow.

     
  • Andreas Lehmkühler

    Logged In: YES
    user_id=2069622
    Originator: YES

    I've found one bug. While deleting the if rules for the rotation, I've deleted line 394 which is still needed.

    I've attached the corrected file

    File Added: PDFStreamEngine.java

     
  • Andreas Lehmkühler

    Corrected PDFStreamEngine

     
  • Andreas Lehmkühler

    Logged In: YES
    user_id=2069622
    Originator: YES

    I forgot to mention that I can't run the test suite. When I try to get the whole project, I realized that I'm behind a firewall here in my office. Consequently my cvs-client doesn't work. I've to do it from home. :-(

    I've only tested one file: 601501018.pdf

    There are additional blanks and they disapper after adding the missing line. But starting at page 21, when the document orientation changes from portrait to landscape, there are additional cr or lf. Hmmmm ??

     
  • Andreas Lehmkühler

    Logged In: YES
    user_id=2069622
    Originator: YES

    I've continued testing and I guess the problem is somewhere starting in org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the coordinates for rotated pages somehow in an other way than the implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
    But for the moment I don't understand what's happening in the TextStripper, perhaps I'll find out later.
    I hope this hint helps ...

     
  • Daniel Wilson

    Daniel Wilson - 2008-05-30

    Logged In: YES
    user_id=1737686
    Originator: NO

    I've put a couple more hours into this, and I don't know the answer.

    I do know the text extraction is the more mature side of this library.

    For the moment, I'll be skipping over your changes to PDFStreamEngine.

    Thanks for the other changes!

     
  • Andreas Lehmkühler

    Logged In: YES
    user_id=2069622
    Originator: YES

    Hi Daniel,

    I guess I've solved the problem. The textposition-handling has to be adjusted within the method PDFTextStripper.flushText(). Of course my former changes to the class PDFStreamEngine are needed. During debugging I found a bug in the class TextPositionComparator (line 82). I solved it by removing the rotation if-clauses. Whenever you compare two Textpositions, it is needless to look at the rotation because they are on the same page so that the comparison is independent of the rotation.

    Furthermore my PDFTextStripper-patch seems to correct some minor problems, which are described in https://sourceforge.net/forum/message.php?msg_id=4976730.

    I've tested the following cases:
    Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
    test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in lines 251-257, 278/279, 502/503, 574/575 and the other differences are some kind of special-character-issues. I guess you have to correct the input at first.

    I've attached my changes based on the newest versions of both classes.

     
  • Andreas Lehmkühler

    Bugfix for PDFTextStripper

     
  • Andreas Lehmkühler

    Logged In: YES
    user_id=2069622
    Originator: YES

    File Added: pdfbox_rotation_patch_3.zip

     

Log in to post a comment.