cbrTekStraktor
an application to automatically extract text from comic books.
... by a combination of statistical and graphical processing operations. It is based on the following 3 major algorithms
- Binarization of color images (Niblak and other methods)
- Connected components
- K-Means clustering
Apache Tesseract is used to perform Optical Character Recognition on the extracted text.
A subsequent version of the application will integrate with translation software in order to provide automated translation of comic book texts and re-inserion of translated texts