The output OCR documents look good. So, the 1 word count is really misleading. We have conditional logic that follows the createDocumentsWithResults() call that relies on the size of the Words list in the OCRResult.
We've encountered a bug when calling createDocumentsWithResults() from Tesseract/tess4j 4.5.5. The Tiff scanned by the method call, has 32 pages, and ~3100 words. Yet, the result produced by the Java call only contains the result of the last page scanned. The OCRResult, in Java, is an empty string in this bounding box: [ [Confidence: 95.000000 Bounding box: 313 434 938 822]], which is the same result when scanning the last page of the Tiff file. Can the Tess4j team investigate this bug ?