CMU Sphinx / Forums / Speech Recognition Theory: Poor recognition after training a word model

Speech Recognition Toolkit

Poor recognition after training a word model

Forum: Speech Recognition Theory

Creator: Anonymous

Created: 2002-05-23

Updated: 2012-09-22

Anonymous - 2002-05-23

Hi,
I trained a word model for 26 words. Basically I ran all the perl scripts mentioned in the tinydoc.txt
I converted them to S2 formats using make_s2_models.pl. I need some clarifications.

1. After conversion I got .ccode, .xcode, .d2code, & .p3code files apart from .chmm, phone, map & sendump files. In the Acoustic model(model/hmm/6k directory) downloaded from the sphinx these .ccode, .xcode, .d2code & .p3code files are NOT present. Are these files needed & if yes how are they useful?

2. I used the S2 model obtained for speech recognition but the result was disastrous because any word I spoke I got the same word as the result. Please tell me whether all the perl scipts mentioned in tinydoc.txt ( make_feat, scripts 00.verify, 01.slave.VQ.pl ... make_s2_models.pl) have to be run for a WORD model?

Thanks
Edison

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2002-05-23
  
  On running the sphinx2-batch ( force alignment ) I got the following output -
  
  Use CMU Sphinx2 to perform phonetic alignment of an audio file
  to a known text transcription. Shows an example.
  D:\sphinx\sphinx2-0.4\src\libsphinx2\time_align.c(478): rcsid == $Id: time_align
  .c,v 1.13 2001/12/11 00:24:48 lenzo Exp $
  D:\sphinx\sphinx2-0.4\src\libsphinx2\time_align.c(495): state_bp_table size 937K
  
  D:\sphinx\sphinx2-0.4\src\libsphinx2\time_align.c(499): phone_bp_table size 156K
  
  D:\sphinx\sphinx2-0.4\src\libsphinx2\time_align.c(503): word_bp_table size 15K
  0 compound words found
  0.248 = AGC MAX
  0.275 = AGC MAX
  0.206 = AGC MAX
  0.278 = AGC MAX
  0.214 = AGC MAX
  0.259 = AGC MAX
  0.302 = AGC MAX
  0.294 = AGC MAX
  0.258 = AGC MAX
  0.241 = AGC MAX
  0.253 = AGC MAX
  0.186 = AGC MAX
  0.252 = AGC MAX
  0.251 = AGC MAX
  0.235 = AGC MAX
  0.193 = AGC MAX
  0.300 = AGC MAX
  0.276 = AGC MAX
  0.284 = AGC MAX
  0.293 = AGC MAX
  0.253 = AGC MAX
  0.230 = AGC MAX
  0.337 = AGC MAX
  0.197 = AGC MAX
  0.259 = AGC MAX
  ...
  
  Can someone tell if this output is normal(healthy)?
  
  Thanks in advance
  Edison
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Poor recognition after training a word model

Speech Recognition Toolkit

Forums

Help

Poor recognition after training a word model document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Poor recognition after training a word model