Thanks a lot for your advice, Farkas! I will look into the pre-segmentation tools.
Hi Farkas, I am testing the alignments with both your brilliant LF aligner and Hunalign on my 3 zh-en datasets. For your LF aligner, I feed the utf-8 unpreprocessed txt files without sentence segmentation; for hunalign I feed utf-8 texts with chinese segmentation and chinese and english tokenzation, and also used my own zh-en dictionary with over 40k entries and your zh-en dic generated from LF aligner, also with realign option added. The testing precision with LF aligner vs. hunalign are as follows:...