Menu

sphinx2-demo

2000-01-31
2012-09-22
  • R. Paul McCarty

    R. Paul McCarty - 2000-01-31

    I've installed the sphinx2 package on my Linux-2.0.35 system, using Crystal CS4236 sound, and OSS and I've been running the demo program (sphinx2-demo) with some success.  It gets most of the words correct if I say them one at a time, but it has trouble with multiple words, usually correct less then 30% of the time.  Has anyone had better experience? It's interesting, looking at the corpus I would expect the mistakes to resemble the corpus, for example if I say:

    go backwards -> home to the to the
    go right -> go a

    I guess that's not too surprising since there is a high probability of the bigram "to the".  I've tried adjusting the microphone volume using xmixer and it doesn't seem to change anything. Also, key clicks frequently trigger "the" and "to". I'm sure this is a simple adjustment of the mic location.

    Is this pretty normal for out of the box performance? I'm taking a course in statistical language models, and we have to do a term project, and this might be a good set of libraries to work on.  If I could come up with a project that could help the distribution that would be wonderful.

    Cheers.
    -Paul

     
    • Ricky Houghton

      Ricky Houghton - 2000-01-31

      Sphinx is somewhat microphone sensitive. If you are not using a close talking microphone, this is to be expected. It is reasonable that we could collect non-close-talking microphone data and then train or adapt on this type of data.

      Regarindg microphone levels: If you are triggering the system on key clicks, the microphone volume is probably too high.

      These results are not typical, not even with these low quality acoustic models. If the volume really is too high, then the recognizer will run slower and output basic psuedo random stuff....psuedo because the lm comes into play.

      Ricky

      A few comments if you are interested:

      cont_ad.c outputs some debugging information. If you open a text file for writing and pass the FILE pointer to cont_ad_set_logfp, cont_ad_set_logfp (FILE *fp), then information about the audio levels will be output. Alternatively, calls to cont_ad_read require a pointer to cont_ad_t. After each call to cont_ad_read, you can print the signal level with

      cont_ad_t foo;
      cont_ad_read (&foo,buf,2048);
      printf ("Signal level: %d\n"foo.siglvl);

      This value should fluctuate between 0 and 97. I believe that ~70 is a reasonable level,however I will have to verify that this is the real target.  This should be MAX db of the sample just collected. It is a useful guide.

      I have a graphic display of this value within the Win32 OCX, it would be nice to have someone provide the same thing on the LINUX side.

       
    • Kevin A. Lenzo

      Kevin A. Lenzo - 2000-01-31

      I've had success with the cs4236 in the IBM ThinkPad 600.  The perdormance was OK when we amplified the input signal... the recognition rate got much better with some signal amplification.  I'm not sure if you've got a laptop, but performance has been better overall on the desktop machines -- one with an SB16 and one with a Creative PCI card.

      Adjusting the mic volume in xmixer should change things, actually.  It should make a pretty big difference.  You might try installing ALSA
      (http://www.alsa-project.org); I find it useful because it supports full duplex so well.

      kevin

      A decent example language model that we could put out for people to try would be a good contribution, though -- and a decent project.

       

Log in to post a comment.