Menu

VietOCR / News: Recent posts

VietOCR v4.4 Release

A Java GUI frontend for Tesseract OCR engine. The release includes the following improvements by John Helour:

  • Additional image filters
  • Expand support to include Regex text replacements from DangAmbigs.txt file
  • Hyphen replacements

http://vietocr.sf.net

Posted by Quan Nguyen 2017-01-14

jTessBoxEditorFX v1.0 Release

jTessBoxEditorFX is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The JavaFX-based program was developed to address the existing issue of rendering complex scripts in Swing-based jTessBoxEditor program. It requires Java Runtime Environment 8u40 or later.... read more

Posted by Quan Nguyen 2017-01-07

jTessBoxEditor v1.7.1 Release

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7.0 or later.

This release includes the following improvement:
- Update Tesseract training executable 3.05dev (2016-11-11)... read more

Posted by Quan Nguyen 2017-01-07

jTessBoxEditor v1.7 Release

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7.0 or later.

This release includes the following improvements:

  • Update Tesseract training executable 3.05dev (2016-08-31)
  • Generated images are now compressed to reduce file sizes
  • Additional parameters for text2image command
  • Use BreakIterator for character boundary analysis... read more
Posted by Quan Nguyen 2016-09-13

jTessBoxEditor v1.6 Release

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7.0 or later.

This release includes the following improvements:

Posted by Quan Nguyen 2016-06-04

VietOCR.NET v4.2 Release

A .NET GUI frontend for Tesseract OCR 3.04 engine. The release includes the following improvements:

  • Implement remove lines & crop image function
  • Display segmented regions
  • Update Tesseract.NET library
  • Update GhostScript to 9.19

http://vietocr.sf.net

Posted by Quan Nguyen 2016-05-30

VietOCR v4.3 Release

A Java GUI frontend for Tesseract OCR engine. The release includes the following improvements:

  • Implement remove lines & crop image function
  • Update Tess4J to 3.2.1
  • Update various dependency versions
  • Convert WIA scanned image BMP to PNG

http://vietocr.sf.net

Posted by Quan Nguyen 2016-05-30

VietOCR v4.2 Release

A Java GUI frontend for Tesseract OCR engine. The release includes the following improvements:

  • Upgrade to Tesseract 3.04.01 (4ef68a0)
  • Upgrade to Tess4J 3.1
  • Update various dependency versions

http://vietocr.sf.net

Posted by Quan Nguyen 2016-04-02

jTessBoxEditor v1.5 Release

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7.0 or later.

This release includes the following improvements:

  • Train only images with box files
  • Create or update font_properties file... read more
Posted by Quan Nguyen 2016-03-09

VietOCR.NET v4.1 Release

A .NET GUI frontend for Tesseract OCR 3.04 engine. The release includes the following improvement:

  • Update Tesseract.NET to 3.0.2.0

http://vietocr.sf.net

Posted by Quan Nguyen 2016-02-20

VietOCR.NET v4.0 Release

A .NET GUI frontend for Tesseract OCR 3.04 engine. The release includes the following improvements:

  • Upgrade Tesseract.NET to 3.0.1.0 (Tesseract 3.04)
  • Upgrade to .NET 4.0

http://vietocr.sf.net

Posted by Quan Nguyen 2016-01-30

VietOCR v4.1 Release

A Java GUI frontend for Tesseract OCR engine. The release includes the following improvements:

  • Upgrade to Tesseract 3.04 (953523b)
  • Upgrade to Tess4J 3.0
  • Image zoom with mousewheel and Ctrl key
  • Display segmented regions
  • Update JNA to 4.2.1
  • Update JACOB to 1.8
  • Update translations

http://vietocr.sf.net

Posted by Quan Nguyen 2016-01-18

VietOCR.NET v3.7 Release

A .NET GUI frontend for Tesseract OCR 3.02 engine. The release includes the following improvements:

  • Update Tesseract.NET to 2.4.1.0
  • Update GhostScript to 9.18
  • Update translations
  • Fix a hang issue with download of multiple language data packs
  • Image zoom with mousewheel and Ctrl key

http://vietocr.sf.net

Posted by Quan Nguyen 2015-12-13

jTessBoxEditor v1.4 Release

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7.0 or later.

This release incorporates improvements by A2K in using hotkeys for box movement control in Box View, and adjustable box scaling and margins of Box View... read more

Posted by Quan Nguyen 2015-05-02

VietOCR v4.0 Release

A Java GUI frontend for Tesseract OCR engine. The release includes the following improvements:

  • Upgrade to Tesseract 3.03 RC (r1127)
  • Upgrade Tess4J to v2.0
  • Add support for searchable PDF output in bulk/batch mode

http://vietocr.sf.net

Posted by Quan Nguyen 2015-03-31

VietOCR v3.6 & VietOCR.NET v3.6 Releases

A Java/.NET GUI frontend for Tesseract OCR engine. The releases include the following improvements:

  • Add Split TIFF function
  • Add thumbnail bar for ease of page navigation
  • Display useful info in statusbar
  • Update links to OpenOffice dictionaries
  • Add support for reading specific configs files for setting control parameters

.NET:
- Update NHunspell to 1.2.5359
- Update Tesseract.NET to 2.2.0.0... read more

Posted by Quan Nguyen 2015-03-06

jTessBoxEditor v1.3 Release

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7.0 or later.

This release implements a function to validate generated traineddata.

http://vietocr.sourceforge.net/training.html

Posted by Quan Nguyen 2015-01-08

jTessBoxEditor v1.2.1 Release

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7.0 or later.

This release fixes a regression bug caused by RTL training by applying unicharset's Unicode character directionality fix only when RTL is selected.... read more

Posted by Quan Nguyen 2014-11-21

jTessBoxEditor v1.2 Release

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7.0 or later.

This release includes the following improvements:

  • Break up the training process to allow flexible, incremental training
  • Incorporate logging... read more
Posted by Quan Nguyen 2014-11-07

jTessBoxEditor v1.1 Release

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7.0 or later.

This release includes the following improvements:

  • Add training support for Right-to-Left (RTL) text
  • Add horizontal box split using modifier keys
  • Add split multi-page TIFF function... read more
Posted by Quan Nguyen 2014-11-07

VietOCR & VietOCR.NET v3.5

A Java/.NET GUI frontend for Tesseract OCR engine. The releases include the following improvements:

  • Upgrade to Tesseract 3.02.03 (r866)
  • Enhance Bulk ops with subdirectory support
  • Incorporate image filters to enhance images for OCR
  • Implement Auto Crop and Undo functions
  • Additional translations
  • Update Tess4J library; JNA to v4.0; JACOB to v1.17 (Java only)

http://vietocr.sf.net

Posted by Quan Nguyen 2014-01-25

jTessBoxEditor v1.0 Release

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 6.0 or later.

This release includes the following improvements:

  • Integrate support for full automation of Tesseract training
  • Bundle Tesseract Windows training executables (r866), English data, and config files
  • Fix an issue with generated TIFF missing metadata
  • Optionally add noise to generated image
  • Bug fixes and improvements... read more
Posted by Quan Nguyen 2013-11-16

jTessBoxEditor v0.9 Release

jTessBoxEditor is a box editor for Tesseract OCR data, providing editing of box data of both Tesseract 2.0x and 3.0x formats. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 6.0 or later.

This release includes the following improvements:

  • Enhance Generate TIFF/Box functionality to allow for combining prepending symbols in addition to appending
  • Fix a bug that failed to persist changes to table in edit mode
  • Find function now supports partial matches
  • Fix a problem with table not scrolling along when row header has focus and scrolling
Posted by Quan Nguyen 2013-04-30

jTessBoxEditor v0.8 Release

jTessBoxEditor is a box editor for Tesseract OCR data, providing editing of box data of both Tesseract 2.0x and 3.0x formats. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 6.0 or later.

This release includes the following improvements:

  • Add row number header
  • Char cell now editable
  • Convert Unicode escape sequences where possible
  • Find box now displays Unicode characters and allows search using Unicode escape sequences
  • Improve Generate TIFF/Box functionality:
    -- automatically combine boxes that have the same coordinates or completely encloses one another
    -- automatically combine boxes that are combining symbols, specified in an external file, with the main, base character
    -- retain last-modified exp number in Generate TIFF/Box window... read more
Posted by Quan Nguyen 2013-04-17

VietOCR v3.4.2 & VietOCR.NET v3.4 Releases

A Java/.NET GUI frontend for Tesseract OCR engine. The releases include the following improvements:

Java:

  • Update Tesseract 3.02 to r820
  • Add hocr support for Bulk & Batch and command-line operations
  • Update links to dictionary files
  • Update JNA to v3.5.1

.NET:

  • Upgrade to Tesseract 3.02 .NET wrapper (r820)
  • Add hocr support for Bulk & Batch and command-line operations
  • Update links to dictionary files... read more
Posted by Quan Nguyen 2013-01-08