CMU Sphinx / Forums / Help: tuning parameters for Sphinx4

Emma Kuo - 2008-02-28

When speaking strings of digits or alphabetic characters, users sometimes pause in the middle of the sequence. For example, when speaking a North American phone number, users will often pause after the first three digits and prior to the last four digits. I am having trouble with Sphinx inserting arbitrary numbers after recognizing a partial string when users pause in the middle of speaking a string of digits.

VoiceXML has some parameters for dealing with this type of problem, including the following:

(a) incomplete timeout (the required length of silence following user speech after which the recognizer returns an incomplete match of a grammar)
(b) sensitivity (sensitive to quiet input vs background noise).
(c) confidence level (adjust the level of acceptance of the ASR-generated confidence level)

Are there equivalent parameters for Sphinx? Do you have recommendations for solving my problem of recognizing invalid strings when the user pauses?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2008-02-29
  
  Hm, it depends on language model you are using, can't you just insert filler there?
  
  About confidence, sensitivity and so on, there are silpenalty, word insertion penalty, frontend can be tuned on amount of silence too, but such changes aren't so easy like in commercial recognizers.
  
  Can you provide a small test set on this problem, probably we can try to tune recognition rate. Are you testing on tidigits?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Emma Kuo - 2008-04-08
  
  I don't understand how to insert filler. Can you advise me on how I would do this (or point me to any relevant literature)?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2008-04-09
    
    Just give the the recording and I'll show you. I don't think sphinx4 configuration is described in literature except javadoc files:
    
    http://cmusphinx.sourceforge.net/sphinx4/javadoc/index.html
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Emma Kuo - 2008-04-18
      
      I have put together a small test set. What is the best way to send you the files?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Nickolay V. Shmyrev - 2008-04-19
        
        Upload it to mediafire.com and give a link
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Emma Kuo - 2008-04-19
        
        http://web.cecs.pdx.edu/~ekuo/tenninetest.tar.gz
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2008-04-21
  
  You just need to set a wider beam to get good accuracy. Take wavfile demo. Change it to use wsj model. Set relative beam width to 1e-120. Set word insertion probability to 1e-40. Everything will be recognized correctly. And use a simple grammar with all words in a loop.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Emma Kuo - 2008-04-22
    
    Thank you for all your help. I will try that and see how it works.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2008-04-22
      
      Correct link
      
      http://www.mediafire.com/?zjtdrlnjto0
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2008-04-22
      
      Check the complete example here:
      
      http://www.mediafire.com/upload_complete.php?id=iegimlwbsla
      
      Of course good recognition performance doesn't provide you enough to build a stable system.
      
      As for confidence, with JSGF confidence doesn't work. You can use a trigram language model like in confidence demo. Timeout also can be handled once you'll insert speech marker and non speech data filter with corresponding object properties.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tuning parameters for Sphinx4

Speech Recognition Toolkit

Forums

Help

tuning parameters for Sphinx4 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

tuning parameters for Sphinx4