I have a problem when I launch ./word_align.pl test.transcriptions test.hyp (files attached), it gives me 100% error rate which is not the case when I compared manually.
The characters are in russian, both test.hyp and test.transcription are encoded in UTF-8 Unicode text.
What is wrong?
It seems that word_align.pl sees the cyrillic text in test.hyp as such : ÑÐΜÐ¼Ñ Ð»ÐΜÑ Ð¼ÐΜÐ½Ñ Ð½Ð¸ пÑли ни ÑÑÑк нÐΜ Ð¼Ð¾Ð³Ð»Ð¸ иÑÑÑÐΜбиÑÑ
which is not the case when I open the file.
Thank you for your consideration,
Otherend
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have a problem when I launch ./word_align.pl test.transcriptions test.hyp (files attached), it gives me 100% error rate which is not the case when I compared manually.
The characters are in russian, both test.hyp and test.transcription are encoded in UTF-8 Unicode text.
What is wrong?
It seems that word_align.pl sees the cyrillic text in test.hyp as such : ÑÐΜÐ¼Ñ Ð»ÐΜÑ Ð¼ÐΜÐ½Ñ Ð½Ð¸ пÑли ни ÑÑÑк нÐΜ Ð¼Ð¾Ð³Ð»Ð¸ иÑÑÑÐΜбиÑÑ
which is not the case when I open the file.
Thank you for your consideration,
Otherend
It must be related to your perl version and other details of your environment. It works fine here.