Menu

Training a very small acoustic model

Help
2009-06-23
2012-09-22
  • Mike Medved

    Mike Medved - 2009-06-23

    Hi -

    I'd like to train an acoustic model for a system with a very small number of words, say 20 or less. WSJ1 (the one in trunk of pocketsphinx) works great with my little dictionary and language model, but what I'd like is an incredibly fast implementation and the size of the acoustic model should certainly play into this.

    How much audio do I need to record to get a good acoustic model? Also, should I do a word model instead of phone model? If I understand this, the dictionary would look simply repeat the word and instead of phones in the .phone file, you would put the same word list.

    Thanks!
    M

     
    • Mike Medved

      Mike Medved - 2009-06-25

      OK, let me be explicit. What you are saying is that if I want to have the recognizer work for ONE GUY well, I could have that one guy read the following sentence a whole bunch of times (10? 50?), while recording:

      WAKEUP EVA HOME PAGE PAGE FORWARD PAGE BACK NEXT STEP PREVIOUS STEP I NEED HELP CLOSE HELP HOME PAGE GO TO SLEEP

      These are the words/commands I care about (and only these). Can I just have them read this sentence over and over? Do I need to break it down (i.e. in one audio file would just be one command, like "go to sleep"), or can the whole sentence be in each file?

      M

       
      • Nickolay V. Shmyrev

        > l, I could have that one guy read the following sentence a whole bunch of times (10? 50?),

        100

        > an I just have them read this sentence over and over? Do I need to break it down

        You need to break them. It's better to read by word if you want to recognize them as a separate command, don't join them into sentences

         
    • Nickolay V. Shmyrev

      > How much audio do I need to record to get a good acoustic model?

      Several hours total from around 400 speakers say a minute per speaker

      > Also, should I do a word model instead of phone model?

      Yes

      > If I understand this, the dictionary would look simply repeat the word and instead of phones in the .phone file, you would put the same word list.

      No, this question is covered in docs. Check tidigits dictionary for example.

       
      • Mike Medved

        Mike Medved - 2009-06-24

        > Several hours total from around 400 speakers say a minute per speaker

        What if I am not making a general system - I don't need any Joe to be able to walk up and speak and the system to recognize it. Rather, I'll have say 10 people that I want the system to be REALLY GOOD at recognizing (or even just 1 to start). How would that affect your above statement?

        Does it matter what you record them saying? In other words, if I have 15 words, can I just record someone saying those 15 words a bunch of times?

         
        • Nickolay V. Shmyrev

          > Rather, I'll have say 10 people that I want the system to be REALLY GOOD at recognizing (or even just 1 to start). How would that affect your above statement?

          It wouldn't. The amount of audio is comparable, you need a lot of samples from each speaker you want to recognize.

          > Does it matter what you record them saying? In other words, if I have 15 words, can I just record someone saying those 15 words a bunch of times?

          No, you need to have representation of each speaker.

           
    • Mike Medved

      Mike Medved - 2009-06-25

      Also, could you point me to directions in the documentation on training a word model? I can't seem to find them...

       
    • Mike Medved

      Mike Medved - 2009-06-25

      Also, could you point me to directions in the documentation on training a word model? I can't seem to find them...

       
      • Nickolay V. Shmyrev

        SphinxTrain has folder template with template for tidigits. Just use the dictionary like this one and the small number of senones:

        eight EY_eight T_eight
        five F_five AY_five V_five
        four F_four OW_four R_four
        nine N_nine AY_nine N_nine_2
        oh OW_oh
        one W_one AX_one N_one
        seven S_seven EH_seven V_seven E_seven N_seven
        six S_six I_six K_six S_six_2
        three TH_three R_three II_three
        two T_two OO_two
        zero Z_zero II_zero R_zero OW_zero

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.