Menu

tuning parameters for Sphinx4

Help
Emma Kuo
2008-02-28
2012-09-22
  • Emma Kuo

    Emma Kuo - 2008-02-28

    When speaking strings of digits or alphabetic characters, users sometimes pause in the middle of the sequence. For example, when speaking a North American phone number, users will often pause after the first three digits and prior to the last four digits. I am having trouble with Sphinx inserting arbitrary numbers after recognizing a partial string when users pause in the middle of speaking a string of digits.

    VoiceXML has some parameters for dealing with this type of problem, including the following:

    (a) incomplete timeout (the required length of silence following user speech after which the recognizer returns an incomplete match of a grammar)
    (b) sensitivity (sensitive to quiet input vs background noise).
    (c) confidence level (adjust the level of acceptance of the ASR-generated confidence level)

    Are there equivalent parameters for Sphinx? Do you have recommendations for solving my problem of recognizing invalid strings when the user pauses?

     
    • Nickolay V. Shmyrev

      Hm, it depends on language model you are using, can't you just insert filler there?

      About confidence, sensitivity and so on, there are silpenalty, word insertion penalty, frontend can be tuned on amount of silence too, but such changes aren't so easy like in commercial recognizers.

      Can you provide a small test set on this problem, probably we can try to tune recognition rate. Are you testing on tidigits?

       
    • Emma Kuo

      Emma Kuo - 2008-04-08

      I don't understand how to insert filler. Can you advise me on how I would do this (or point me to any relevant literature)?

       
      • Nickolay V. Shmyrev

        Just give the the recording and I'll show you. I don't think sphinx4 configuration is described in literature except javadoc files:

        http://cmusphinx.sourceforge.net/sphinx4/javadoc/index.html

         
        • Emma Kuo

          Emma Kuo - 2008-04-18

          I have put together a small test set. What is the best way to send you the files?

           
          • Nickolay V. Shmyrev

            Upload it to mediafire.com and give a link

             
    • Nickolay V. Shmyrev

      You just need to set a wider beam to get good accuracy. Take wavfile demo. Change it to use wsj model. Set relative beam width to 1e-120. Set word insertion probability to 1e-40. Everything will be recognized correctly. And use a simple grammar with all words in a loop.

       
      • Emma Kuo

        Emma Kuo - 2008-04-22

        Thank you for all your help. I will try that and see how it works.

         
        • Nickolay V. Shmyrev

           
        • Nickolay V. Shmyrev

          Check the complete example here:

          http://www.mediafire.com/upload_complete.php?id=iegimlwbsla

          Of course good recognition performance doesn't provide you enough to build a stable system.

          As for confidence, with JSGF confidence doesn't work. You can use a trigram language model like in confidence demo. Timeout also can be handled once you'll insert speech marker and non speech data filter with corresponding object properties.

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.