an application to automatically extract text from comic books.
...The text extraction is achieved by a combination of statistical and graphical processing operations. It is based on the following 3 major algorithms
- Binarization of color images (Niblak and other methods)
- Connected components
- K-Meansclustering
Apache Tesseract is used to perform Optical Character Recognition on the extracted text.
A subsequent version of the application will integrate with translation software in order to provide automated translation of comic book texts and re-inserion of translated texts