Menu

Sphinx4--Problem with accuracy @8khz

Help
Zac Wolfe
2004-06-07
2012-09-22
  • Zac Wolfe

    Zac Wolfe - 2004-06-07

    I'm trying to get Sphinx4 to recognize digits using the Asterisk pbx system.  I'm sending PCM-signed little-endian 16bit 8khz audio to Sphinx but the accuracy is pretty bad.  Any hints on how to configure sphinx to handle the relatively poor fidelity of my VoIP app?

     
    • Philip Kwok

      Philip Kwok - 2004-06-07

      Zac,

      In additional to configuring S4 to read 8kHz, also try configuring it to read big endian. I'm suspecting that you're using the StreamDataSource, which expects big-endian by default. So change your config file to use:

      <component name="streamDataSource" type="edu.cmu.sphinx.frontend.util.StreamDataSource">
          <property name="sampleRate" value="8000"/>
          <property name="bigEndianData" value="false"/>
      </component>

      Hope this helps. Note that it might be possible to convert your 8kHz data to 16kHz using the facilities in Java Sound (look at javax.sound.sampled.AudioSystem.getAudioInputStream() methods). We have this built into the edu.cmu.sphinx.frontend.util.Microphone class, but not the StreamDataSource class. Something we should add in the future.

      philip

       
    • Zac Wolfe

      Zac Wolfe - 2004-06-07

      Thanks Philip, I'll try converting to 16kHz and see If that helps.  I had already suspected, as you did, that StreamDataSource expects BigEndian by default and I did change that in my config file.  I actually ended up modifying the Microphone front end code to suit my needs instead of the StreamDataSource and that piece now seems to be in place and working OK. 

      If we can solve this accuracy issue I know there are lots of people in the Asterisk community who would love to be able to use a Sphinx tie-in (for marketing, please press or say "one").

      Zac

       
    • Philip Kwok

      Philip Kwok - 2004-06-07

      Zac,

      If after converting to 16kHz you still don't see an increase in accuracy, feel free to tar-gz up a few of your 8kHz test files (the ones that have problems), and send it to me to try out, if you would like to.

      philip

       
    • Zac Wolfe

      Zac Wolfe - 2004-06-07

      Unfortunately it seems there's no way to convert "up" to 16kHz using the AudioSystem--you can only downsample, which makes sense I guess. 

      I think I'll take you up on your offer to try out some of my files.  Can I send them to your ppk96 address?

      I appreciate it.
      Zac

       
    • Zac Wolfe

      Zac Wolfe - 2004-06-08

      The files I had trouble with seem to work fine in batch mode so there's something about my custom StreamDataSource that's the issue.

       
    • Philip Kwok

      Philip Kwok - 2004-06-09

      Hi Zac,

      I tried feeding 8kHz audio data to Sphinx-4 via the live demo. It simply doesn't recognize very well. Trying to upsample it to 16kHz probably won't give you very good results either. The problem is that our acoustic model data is 16kHz, so 8kHz audio data just won't work for now. We're looking into the possibility of training some 8kHz models. So please hold this off for now. We'll let you know what happens.

      philip

       
    • Zac Wolfe

      Zac Wolfe - 2004-06-09

      Thanks Philip,

      Do you know why there would be a descrepancy between the live vs. batch accuracy?  Sphinx seems to handle the 8kHz data fine in batch mode but has trouble in live situtations. 

      Zac

       
    • Philip Kwok

      Philip Kwok - 2004-06-09

      Hi Zac,

      I think the discrepancy you're seeing is due to endian conversion problem. I actually replied to the same question you posted on the other thread. You'll find the fix at:

      http://sourceforge.net/forum/forum.php?thread_id=1089720&forum_id=5471

      On the other hand, since the acoustic models are trained on 16kHz data, and your data is 8kHz, even if you fix the endian issue its bound to not work very well (meaning you won't get accuracy in the high 90s, which it should). So I hope you won't give up on Sphinx-4 based on this :-) If you actually feed it 16kHz data, decoding digits works very well (accuracy in the high 90s). We are looking into training some 8kHz models, since you're not the first person who wants to be able to handle audio data from the telephone line. Please bear with us for the moment, and we'll let you know as soon as we can.

      philip

       
    • David Sledge

      David Sledge - 2004-07-19

      I am trying to do the same thing using a Cisco router. Is there anymore news on the 8kHz models yet?

       
      • Willie Walker

        Willie Walker - 2004-07-19

        The good news is that I'm now up to speed using SphinxTrain and have been able to create some 16kHz models based on the WSJ corpus.  Many thanks to Bhiksha Raj for getting me started and helping me work through the learning curve.

        I tested the resulting models against our WSJ5K regression test and it seems as though things worked fine.  Of course, this doesn't mean I'm a SphinxTrain expert, but I'm at least able to use it to create models for Sphinx-4.  :-)

        I'm now working on converting the WSJ training data from 16kHz to 8kHz and will spin up some training sessions on my poor little Linux box at home.  If I get something working, I'll figure out a way to get the resulting models into the open source.

        Will

         
        • Zac Wolfe

          Zac Wolfe - 2004-07-26

          That would be great!  I haven't heard anything lately from the Sphinx4 developers on the 8khz data they promised so if you have any success with this, please share!  My email is zacw@comcast.net. 

          Thanks in advance,
          Zac

           
    • Willie Walker

      Willie Walker - 2004-08-03

      Yesterday, my little linux box stopped whirring and out popped some 8kHz models trained from the clean channel of the WSJ0 training data.  I did some testing today, and they seem to give OK results for 8kHz data.

      They're now in the CVS repository under the sphinx4 module.  If you do an update followed by an "ant clean all," you should end up with a jar file containing the new model:  lib/WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar

      Since this model was trained by merely downsampling the 16kHz data down to 8kHz, it doesn't include any telephony channel characteristics.  So, I'm not sure how well it will work with your VoIP app.  But, give it a shot and let me know how it works.  If the model gives you better accuracy, but still isn't good enough for digits, I might try training up some 8kHz TIDIGITS models.

      Signed,

      Will, who's happy he got this far, but still doesn't feel up to the task of being able to answer many questions about SphinxTrain.  :-)

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.