Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Getting Content Stream?

Palani
2009-03-11
2013-05-28
  • Palani
    Palani
    2009-03-11

    Hi,

    I am developing a PDF touchup module where I load the existing PDF document using jPod and work most of the cases. However, I got some scanned PDF documents (with OCR), where the document will have text layer (OCR generates a PDF file with only text) and scanned image overlayed on top of that. In this case, when I get the PDF Content stream using "getContentStream" method from PDPage, I am not getting any of the COSString operands. How do I access these objects? Could any of provide me some pointer?

    Thanks,

    Palani

     
    • Elfi Heck
      Elfi Heck
      2009-03-11

      Maybe the generator program puts the text in an annotation or a form. Hard to say without actually looking at the document.

       
    • Palani
      Palani
      2009-03-12

      Hi,
      Thanks for your quick comments and please find attached the test document that I am using for. Since I don't know how to attach documents, I have already sent it as an e-mail. Please accept my appologies for this. Please guide to resolve this issue or let me know, if there is any other approachs to solve my problem. BTW, I am trying to do touchup tool to correct typo errors.

      Thanks,

      Sridharan

       
    • Elfi Heck
      Elfi Heck
      2009-03-12

      Yes, the text is in a form.
      The "Contents" of page 1 consist of two streams: in one the image (/Im0) is drawn and the form (/Xi0) in the other.

      You can use the "COS Browser" tool in CABAReT Stage (which uses the jPod library and has a free download for evaluation) to get a hierarchical view of the PDF structure. Navigate to "Root->Pages->Kids->0->Contents" to see the page contents and to "Root->Pages->Kids->0->Resources->XObject->Xi0->Contents" to see the form contents. You can get CABAReT Stage from http://www.cabaret-solutions.com .

       
    • Palani
      Palani
      2009-03-15

      Thanks for your time that you took to analysis my test document. Is it possible to access the text content stream using jPod?

      Thanks again,

      Palani