<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to OCR Engines</title><link>https://sourceforge.net/p/oculix/wiki/OCR%2520Engines/</link><description>Recent changes to OCR Engines</description><atom:link href="https://sourceforge.net/p/oculix/wiki/OCR%20Engines/feed" rel="self"/><language>en</language><lastBuildDate>Sun, 12 Apr 2026 00:55:56 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/oculix/wiki/OCR%20Engines/feed" rel="self" type="application/rss+xml"/><item><title>OCR Engines modified by Julien Mer</title><link>https://sourceforge.net/p/oculix/wiki/OCR%2520Engines/</link><description>&lt;div class="markdown_content"&gt;&lt;h1 id="h-ocr-engines"&gt;OCR Engines&lt;/h1&gt;
&lt;p&gt;&lt;img alt="New" rel="nofollow" src="https://img.shields.io/badge/type-new%20feature-brightgreen?style=for-the-badge"/&gt;&lt;br/&gt;
&lt;img alt="PaddleOCR" rel="nofollow" src="https://img.shields.io/badge/PaddleOCR-primary-blue?style=for-the-badge"/&gt;&lt;br/&gt;
&lt;img alt="Tesseract" rel="nofollow" src="https://img.shields.io/badge/Tesseract-fallback-grey?style=for-the-badge"/&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OculiX introduces a pluggable OCR architecture with PaddleOCR as primary engine and Tesseract as fallback.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr/&gt;
&lt;h2 id="h-architecture"&gt;Architecture&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;OCREngine (interface)
  ├── PaddleOCREngine   → HTTP client, zero external deps
  │     └── PaddleOCRClient  → JSON parsing, connection handling (629 lines)
  └── TesseractEngine   → Tess4J wrapper (247 lines)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Auto-detection: PaddleOCR is tried first. If unavailable, falls back to Tesseract.&lt;/p&gt;
&lt;h2 id="h-paddleocr-integration"&gt;PaddleOCR Integration&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PaddleOCREngine&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Engine adapter (629 lines)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PaddleOCRClient&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Zero-dependency HTTP client with manual JSON parsing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protocol&lt;/td&gt;
&lt;td&gt;HTTP REST API to a PaddleOCR server&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="h-utilities"&gt;Utilities&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Class&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AmountVariantGenerator&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generates tolerance variants for monetary formats (e.g. &lt;code&gt;1,234.56&lt;/code&gt; ↔ &lt;code&gt;1234.56&lt;/code&gt; ↔ &lt;code&gt;1 234,56&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TextNormalizer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Accent stripping, case-insensitive comparison, whitespace normalization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="h-key-classes"&gt;Key Classes&lt;/h2&gt;
&lt;p&gt;All in &lt;code&gt;com.sikulix.ocr.*&lt;/code&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Lines&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OCREngine.java&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Interface for pluggable engines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PaddleOCREngine.java&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;629&lt;/td&gt;
&lt;td&gt;PaddleOCR implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PaddleOCRClient.java&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;629&lt;/td&gt;
&lt;td&gt;HTTP client, zero-dep&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TesseractEngine.java&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;247&lt;/td&gt;
&lt;td&gt;Tesseract fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AmountVariantGenerator.java&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;93&lt;/td&gt;
&lt;td&gt;Monetary format tolerance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TextNormalizer.java&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;td&gt;Text normalization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Julien Mer</dc:creator><pubDate>Sun, 12 Apr 2026 00:55:56 -0000</pubDate><guid>https://sourceforge.netfb1946f023fa3c958b8e084eaf0f5d3265ebb9bd</guid></item></channel></rss>