Menu

Does jTessBoxEditor-2.0-Beta support training from Tesseract 4.0 LSTM

Help
2017-04-12
2017-04-12
  • Ahmad Moawad

    Ahmad Moawad - 2017-04-12

    hello All,

    I want to ask about this version if it supports Training of the new version of tesseract 4.0 LSTM.

    And if not can anyone help in:

    this is the part from https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

    My question related to the image part not making training from text

    The overall training process is similar to training 3.04 Conceptually the same:

    Prepare training text.
    Render text to image + box file. (Or create hand-made box files for existing image data.)
    Make unicharset file.
    Optionally make dictionary data.
    Run tesseract to process image + box file to make training data set.
    Run training on training data set.
    Combine data files.

    Are the above steps similar to:

    tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train
    unicharset_extractor ara.arial.exp4.box
    echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations about the font
    mftraining -F font_properties -U unicharset -O ara.unicharset ara.arial.exp4.tr
    shapeclustering -F unicharset ara.arial.exp4.tr
    cntraining ara.arial.exp4.tr
    
    mv inttemp ara.inttemp
    mv normproto ara.normproto
    mv pffmtable ara.pffmtable
    mv shapetable ara.shapetable
    combine_tessdata ara.
    

    Should I use these steps or not.

    Thanks!!

     

    Last edit: Ahmad Moawad 2017-04-12

Log in to post a comment.