#10 wrong coordinate transformation
toshy kava

so, for some PDF like the one attached TextExtractor is not reading coordinates correctly, it refers most of all to Y coordinate. i.e. text lines don't appear in same order as in original PDF.

for some PDFs, Y could be negative. (couldn't attach, too big)

Thank you.


  • toshy kava

    toshy kava - 2011-01-27

    This is the way wikipedia exports its articles

  • toshy kava

    toshy kava - 2011-01-27

    forgot to say, I used AdvancedTextExtractionSample.

  • Iain Roberts

    Iain Roberts - 2011-01-31

    I also hit this issue. Not sure if its a correct fix, but I found that it is caused when you have Adjusted Text, and inside it has a SetTextLead. It appears that the lead is being added, instead of subtracted. I changed SetTextLead.cs:68 to the following and it now works.

    {state.Lead = -1*Value;}

    P.S thanks for a nice framework :o)

  • Stefano Chizzolini

    This issue has been fixed by version

  • Stefano Chizzolini

    • status: open --> closed-out-of-date
    • assigned_to: Stefano Chizzolini
    • Group: -->
    • Priority: 5 --> 1

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks