Valeria BUCCI - 2021-01-18

Hi ,

I used CmuSpinx for training with success using audios without accented characters on Windows.
When I add audios and transcriptions with accented characters (I'm french) as "é" or "ç"... in the training, I have the following error in the log file during the Phase 6: Checking that all the words in the transcript are in the dictionary:
"WARNING: This word: barrières was in the transcript file, but is not in the dictionary ( tu prends la whisky trente on ferme les barrières et tu me rappelles arrivé au whisky ). Do cases match?
WARNING: This word: arrivé was in the transcript file, but is not in the dictionary ( tu prends la whisky trente on ferme les barrières et tu me rappelles arrivé au whisky ). Do cases match?"

I checked the dictionnary and the accented words (barrières and arrivé) are both in the dictionnary.
Have you any idea of the problem?
Is there a particular parameter to set to manage accents?

I join a zip file containing the /etc directory with dictionnary and transcriptions and the log file.
Hope you can help me.
Thanks
Valeria

 

Last edit: Valeria BUCCI 2021-01-18