Ocr for fisrt page in multi page PDF file

Brought to you by: nguyenq

Ocr for fisrt page in multi page PDF file

Forum: Open Discussion

Creator: Alfonso Vizcaino

Created: 2021-12-27

Updated: 2021-12-30

Alfonso Vizcaino - 2021-12-27

Hello

When using PDF files with multiple pages, is there a way to specify which page i want to do OCR?

Thanks

Last edit: Alfonso Vizcaino 2021-12-27

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Quan Nguyen - 2021-12-30

No. The program will convert the input PDF to a multi-page TIFF image.

What you can do is process the PDF before the OCR step, probably use PDFBox to extract a specified page, then convert that page to an image, and send it to tesseract engine.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.