Capture2Text / Tickets / #218 Update to recent version of tesseract

#218 Update to recent version of tesseract

Milestone: 4.X

Status: open

Owner: nobody

Labels: None

Updated: 2024-06-06

Created: 2023-01-24

Creator: Zdenko

Private: No

Can you please update to the recent leptonica[1] (1.83) and tesseract (5.3.0)[2]? There is a lot of fixes and improvements and speed improvements.
Also please consider minimalist leptonica build (only really need libraries) - see e.g. first part of[3].

[1] https://github.com/DanBloomberg/leptonica/releases/tag/1.83.0
[2] https://github.com/tesseract-ocr/tesseract/releases/tag/5.3.0
[3] https://bucket401.blogspot.com/2021/03/building-tesserocr-on-ms-windows-64bit.html

Discussion

Gabriel Lambert - 2023-02-10

I second this

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Smith - 2023-03-20

It would be so great.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Smith - 2023-03-24

So, in version 4.6.3 Capture2Text we have -- leptonica 1.74.4 and tesseract 4.00 , which represented in "pvt.cppan.demo.danbloomberg.leptonica-1.74.4.dll" and "tesseract400.dll"
Will it work, if we will replace those dll on a build, made by those instructions? -- https://bucket401.blogspot.com/2021/03/building-tesserocr-on-ms-windows-64bit.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zdenko - 2023-03-24

No, replacing of dll will not work. You have to recompile Capture2Text against tesseract (and its dependencies).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Setsumi - 2024-06-06

Updated tesseract here https://github.com/setsumi/Capture2TextPlus#capture2textplus
Haven't noticed any improvements. Trained data is still the same.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zdenko - 2024-06-06

capture2text uses tesseract 4.0 build at 2017 (at the moment 7 years ago!) Check how many commits where made https://github.com/tesseract-ocr/tesseract/commits/main/. Non of them are relevant to you? e.g. speed improvements?

Official trainnedata did not changed, but you can do custom fine tuning for you case.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.