I have thousands of png files taken from family history websites. I would like to be able to just copy the text, and then paste into my database.
For some reason OCRing just a sample of 10 png's does not seem to do anything, and nothing tells me if anything has happened, or even if its finished. All i need is the ocr to be outputed into txt files.
I however managed to get all png's into 1 pdf and also individual pdf's generated. I do not like copying text from off pdf, as they tend to be very table oriented, which makes copying difficult.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When OCR is enabled and a PDF with imported images is saved, the resulting PDF is searchable and text can be copied out of it, but for that you have to right click in Adobe reader and choose the select tool. With that you can select the text withing the PDF and copy it to clipboards.
On the other hand there is PDFGear that can save an imported PDF into TXT.
Last edit: SelfMan 2026-03-26
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
How to convert prescanned png's into pdf or txt ?
Just cannot find a method of how to do this.
Any ideas please.
You did not say if you want to do OCR too, but try to open NAPS2 and drag and drop the images into it. Or use the IMPORT menu button.
Sorry - my brain sometimes thinks everyone knows what I am thinking.
Yes OCR as well.
Import or drag and drop files or a directory.
Not sure if each png file would then have its txt file saved seperatly into the same directory as the png?
Why do you need a TXT file?
Check the documentation here: https://www.naps2.com/doc/ocr
I have thousands of png files taken from family history websites. I would like to be able to just copy the text, and then paste into my database.
For some reason OCRing just a sample of 10 png's does not seem to do anything, and nothing tells me if anything has happened, or even if its finished. All i need is the ocr to be outputed into txt files.
I however managed to get all png's into 1 pdf and also individual pdf's generated. I do not like copying text from off pdf, as they tend to be very table oriented, which makes copying difficult.
If you're comfortable using a command line, you can just use tesseract directly for this use case rather than trying to do it through NAPS2.
When OCR is enabled and a PDF with imported images is saved, the resulting PDF is searchable and text can be copied out of it, but for that you have to right click in Adobe reader and choose the select tool. With that you can select the text withing the PDF and copy it to clipboards.
On the other hand there is PDFGear that can save an imported PDF into TXT.
Last edit: SelfMan 2026-03-26