CMU Sphinx / Forums / Help: Problems with adapting acoustic model for French language

Hello everyone,

Since I was having bad recgnition accuracy,I am trying to do the adaption of the acoustic model for the French language such that the pocketsphinx should be able to recognize phonemes such as: syllables( like "de", "re", "se", etc), consonants (like "m", "f", "g", etc), double-consonants(like "kl", "ks", "gr",etc) and vowels(like "a", "o","e",etc). But, while doing the adaption, I get some warnings when I try to run the bw.exe. The warning I get are the following:

utt>     5                   test_du  208INFO: cmn.c(183): CMN: 48.33  0.34 10.24 20.89  7.41  2.60 -0.49  5.13 -2.59 -0.92 -2.69 -0.62 -6.68
    0WARN: "mk_phone_list.c", line 178: Unable to lookup word '<s>du' in the dictionary
WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s>du du</s>'
     0WARN: "main.c", line 826: Skipped utterance '<s>du du</s>'
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     6                    test_e  162INFO: cmn.c(183): CMN: 44.01 -2.43 -8.11  1.41 -3.36  5.76 -5.79 -1.00 -1.14  4.63 -2.73 -5.22  4.06
    0WARN: "mk_phone_list.c", line 178: Unable to lookup word '<s>Þ' in the dictionary
WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s>Þ Þ</s>'
     0WARN: "main.c", line 826: Skipped utterance '<s>Þ Þ</s>'
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     7                   test_e_  174INFO: cmn.c(183): CMN: 47.90 -3.54  7.92 17.79 -18.05 -4.92 -4.06 -0.31 -10.55 -2.35 -2.37  0.61  3.48
    0WARN: "mk_phone_list.c", line 178: Unable to lookup word '<s>Ú' in the dictionary
WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s>Ú Ú</s>'
     0WARN: "main.c", line 826: Skipped utterance '<s>Ú Ú</s>'
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     8                   test_el  328INFO: cmn.c(183): CMN: 51.35 -0.83  0.37  9.30  0.25  0.88 -10.30 -1.43 -3.58  3.93 -0.76  4.42  8.67
    0WARN: "mk_phone_list.c", line 178: Unable to lookup word '<s>Þl' in the dictionary
WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s>Þl Þl</s>'
     0WARN: "main.c", line 826: Skipped utterance '<s>Þl Þl</s>'
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     9                   test_eu  188INFO: cmn.c(183): CMN: 41.86  6.55 -0.19  5.00 -2.79  7.14 -8.16 -7.59 -10.68  3.26  3.84 -7.00 -5.58
    0WARN: "mk_phone_list.c", line 178: Unable to lookup word '<s>eu' in the dictionary
WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s>eu eu</s>'
     0WARN: "main.c", line 826: Skipped utterance '<s>eu eu</s>'
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    10                   test_il  387INFO: cmn.c(183): CMN: 46.31 -4.04  3.11  8.30  3.47  7.88 -10.23 -2.11 -0.99 -5.34 -4.02 -0.71 -2.58
    0WARN: "mk_phone_list.c", line 178: Unable to lookup word '<s>il' in the dictionary
WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s>il il</s>'
     0WARN: "main.c", line 826: Skipped utterance '<s>il il</s>'
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    11                   test_in  231INFO: cmn.c(183): CMN: 45.34 -3.35 -7.22 -2.88 -5.67  7.61 -10.35 -0.49  0.53 -3.12 -5.31 -6.71 -6.53
    0WARN: "mk_phone_list.c", line 178: Unable to lookup word '<s>in' in the dictionary
WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s>in in</s>'
     0WARN: "main.c", line 826: Skipped utterance '<s>in in</s>'
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    12                   test_je  264INFO: cmn.c(183): CMN: 51.30 -8.26 -0.21 11.89  5.76 -1.52  0.70 -3.63  1.64  1.86 -3.09 -2.61  1.78
    0WARN: "mk_phone_list.c", line 178: Unable to lookup word '<s>je' in the dictionary
WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s>je je</s>'
     0WARN: "main.c", line 826: Skipped utterance '<s>je je</s>'
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

Also, you can find attached the transcription reference file and the wav files that I want to do the adaption on.

Could any one of you please let me know what how to deal with these issues?

Thank you very much in advance!

Leutrim

Last edit: Nickolay V. Shmyrev 2016-04-26

test.transcription

wav.zip

Nickolay V. Shmyrev - 2016-04-26

You need to have spaces after <s> and before </s>.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Leo - 2016-04-27

Hello Nickolay,

Thank you very much for your answer.
It worked for the most of the letters I want to recognize. But, for some of them I still get the same warnings, such as: "pl", "ye", etc. Should I just add them somewhere?

I also tested the adapted model with the letters that I did not have the warnings mentioned above, and the accuracy is improved. But, when I test them, I still get some letters that are not pronounced at all. Could be the reason that the person in the audio file starts to speak after 0.2 second (as I read on the website of the CMUsphinx, the amount of silence in the beginning of the utterance should not exceed 0.2 second, and in my case it starts after the 1st second)? Or, how can I avoid the appearnace of these letters that are not pronounced at all?

Thank you very much in advance!

Best regards,
Leutrim

Last edit: Leo 2016-04-27

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-04-27
  
  But, for some of them I still get the same warnings, such as: "pl", "ye", etc. Should I just add them somewhere?
  
  You need to add them to the dictionary file
  
  Could be the reason that the person in the audio file starts to speak after 0.2 second (as I read on the website of the CMUsphinx, the amount of silence in the beginning of the utterance should not exceed 0.2 second, and in my case it starts after the 1st second)?
  
  It is better to follow recommendation of tutorial precisely, however, it is not the reason of extra insertions.
  
  Or, how can I avoid the appearnace of these letters that are not pronounced at all?
  
  You can play with langauge weight and phone insertion probability (-pip) to avoid extra insertions. However, it is not possible to eliminate them altogether. Some insertions are ok.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Leo - 2016-04-27
    
    Hello Nickolay,
    
    Thank you very much for your answer. It worked, now I do not have missing letters.
    
    Regarding the extra insertions, since my application is supposed to be used by people who have problems with speaking (suffering from aphasia, a brain disorder), they have to learn to pronounce correctly the phonemes(the letters I have mentioned above) I give to them, and I should give a feedback to them whether the pronounciation is correct or not.
    Since I have as output even letters that are not pronounced at all, this might be a problem to give the right feedback to the users. I have an idea for solving this, but this is a simple one, and might not be a good one: * each phoneme that is supposed to recognize, would have its representative (according to the tests performed, since we know what should be recognized), and if the representative matches with the recognized result, the pronounciation could be considered as a correct one*.
    
    Since this is a scientific research as well, is that possible to state that phoneme recognition needs still improvements, and argumenting why these extra letters are appearing? Or, could be there other solution for this problem?
    Thank you very much in advance!
    Leutrim
    
    Last edit: Leo 2016-04-28
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2016-04-28
      
      For scientific research you started too early with experiments without prior art search and theoretical research. I would first build a theory on how are you going to do the scoring and only then start with experiments.
      
      There are many ways to verify correctness of pronunciation, but in general you can't proceed with model only trained on the correct pronunciation. You need an alternative model of bad pronunciation in order to make an optimal decision about pronunciation quality.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Leo - 2016-04-28
        
        Hello Nickolay,
        
        First, I would like to thank you very much for your reply and for the support.
        
        Before choosing Pocketsphinx, I did some research on what speech recognition software (that are applicable to mobile devices) are available nowadays, and which of them best would fit to my problem. Thus, I figured out that Pocketpshinx would be the most appropriate one for solving my problem.
        
        Regarding the alternative model for bad pronunciation, I would like just to make sure myself. So, I should desing an alternative model by adapting the default acoustic model with bad pronunciation of the audio samples, and thus we would have two models, the one trained with the correct pronunciations and the other one trained with bad pronunciations. Is that right? Then, how could we use these models? Couldn't I just put a threshold between what is good and bad, and thus using just one model? I am not sure if I am right, I am a bit confused with this alternative model.
        
        Thank you very much!
        Leutrim
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Elpimous - 2016-04-27

HI Leutrim
I´m french too and a friend and i , both work on qbo robot on speech reco.
I made a small tkinter py. program to automate recording texts, speak...
both textfiles are automatically created
adaptation done,
missing word on dic appear and program wait for changes before looping.
you can find it on my github elpimous ( tools section : PAMA)

si on peut s´entraider pour obtenir un wer < 15%...
vincent

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Leo - 2016-04-27
  
  Hello Elpimous,
  .
  I solve the problem with the missing letters by just adding them to the dictionary file.
  Anyway, thank you very much for writing and trying to help me.
  
  Best,
  Leutrm
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Problems with adapting acoustic model for French language

Speech Recognition Toolkit

Forums

Help

Problems with adapting acoustic model for French language document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Problems with adapting acoustic model for French language