#10 wrong coordinate transformation

0.1.2.1
closed-out-of-date
None
1
2015-03-12
2011-01-27
toshy kava
No

so, for some PDF like the one attached TextExtractor is not reading coordinates correctly, it refers most of all to Y coordinate. i.e. text lines don't appear in same order as in original PDF.

for some PDFs, Y could be negative. (couldn't attach, too big)

Thank you.

Discussion

  • toshy kava

    toshy kava - 2011-01-27

    This is the way wikipedia exports its articles

     
  • toshy kava

    toshy kava - 2011-01-27

    forgot to say, I used AdvancedTextExtractionSample.

     
  • Iain Roberts

    Iain Roberts - 2011-01-31

    I also hit this issue. Not sure if its a correct fix, but I found that it is caused when you have Adjusted Text, and inside it has a SetTextLead. It appears that the lead is being added, instead of subtracted. I changed SetTextLead.cs:68 to the following and it now works.

    {state.Lead = -1*Value;}

    P.S thanks for a nice framework :o)

     
  • Stefano Chizzolini

    This issue has been fixed by version 0.1.2.1

     
  • Stefano Chizzolini

    • status: open --> closed-out-of-date
    • assigned_to: Stefano Chizzolini
    • Group: --> 0.1.2.1
    • Priority: 5 --> 1
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks