an application to automatically extract text from comic books.
...It is based on the following 3 major algorithms
- Binarization of color images (Niblak and other methods)
- Connected components
- K-Means clustering
Apache Tesseract is used to perform Optical Character Recognition on the extracted text.
A subsequent version of the application will integrate with translation software in order to provide automated translation of comic book texts and re-inserion of translated texts
This project aims to create a single easy to use GUI wrapper for ghostscript and tesseract to allow scanned pdf to plain text or HTML for scanned documents.
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.