I think we need to work more on this.
python ../src/utf82kh.py test.txt
output: test-k.txt
python ../src/kh2utf8.py ../build/test-k.txt > test-k-u.txt
output: test-k-u.txt
ideally the test-k-u.txt should be the file identical to the test.txt
we need to test these with many samples.
Assigned to ashu:
Ashu you can use the sample file attached to try out if there is leak in encoding, also you can find the large hindi text in corpus in trunk/coupus/hindi.txt
Sample hindi file
Forgot to assigne