Menu

Training a digits dataset robust to noise

Help
ramsestom
2016-06-21
2016-06-21
  • ramsestom

    ramsestom - 2016-06-21

    Hello everyone

    I am new to sphinx and to speech recognition in general but I need to create a small model for digits recognition (in english) that is robust to background noise. I currently have thousands of audio files (from different speakers with different levels of noise) with their traduction into text but I don't really know how I should format them to use them for model traning.
    Each audio file contains the sound from 5 to 12 digits. Do I need to split them into individual digit audio files (one digit/audio file) or is it unnecessary?
    Also do you think it is better to perform an adaptation of the default sphinx english model to improve its accuracy at recognising my digits with noise (I already tested the default english model whithout any adaptation but the performance was poor due to the background noise) or should I train a completely new model?
    And in the case of a new model training, is there different methods/type of models that can be used? If yes, which one is preferable for my case (small langage (only 10 digits) but with noise)?
    Finally, could someone explain me the different steps I should follow in detail to train my model? I have looked at the tutorial but I am unsure of what my Phonetic dictionary, Phoneset file, Language model and List of fillers files should contain...
    Thanks

     
    • Nickolay V. Shmyrev

      Each audio file contains the sound from 5 to 12 digits. Do I need to split them into individual digit audio files (one digit/audio file) or is it unnecessary?

      Data preparation is covered in our tutorial http://cmusphinx.sourceforge.net/wiki/tutorialam You do not need to split.

      Also do you think it is better to perform an adaptation of the default sphinx english model to improve its accuracy at recognising my digits with noise (I already tested the default english model whithout any adaptation but the performance was poor due to the background noise) or should I train a completely new model?

      If you have thousands of files it is better to train a new model.

      Finally, could someone explain me the different steps I should follow in detail to train my model? I have looked at the tutorial but I am unsure of what my Phonetic dictionary, Phoneset file, Language model and List of fillers files should contain.

      You can check here:

      https://github.com/cmusphinx/sphinxtrain/tree/master/templates/tidigits/etc

       
  • ramsestom

    ramsestom - 2016-06-21

    OK thanks for the reply.
    I have a question regarding the Phonetic Dictionary. Is it possible to speciffy different pronounciations for a same word? for exemple the digit "two" can have this phonetic transcription:
    T_two OO_two
    but it can also be:
    T_two UH_two
    So is it possible to specify that both pronounciation are acceptable and if yes, how should I format my .dict file to have both? (in the tutorial there is always only one phonetic transcription for a word)

     
    • Nickolay V. Shmyrev

      It is possible to specify alternative pronunciations but for digits there is no sense to do that.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.