CMU Sphinx / Forums / Help: Normalization error when training

Speech Recognition Toolkit

Normalization error when training

Forum: Help

Creator: roof

Created: 2005-12-08

Updated: 2012-09-22

roof - 2005-12-08

Hello,

I am training using SphinxTrain on linux, a small sub-sample (200 audio files) of my data in order to test that it works so that I can apply the same training over the entire dataset (4000+ audio files).

When training I recieve a stream of errors during the Normalization for Iteration 1 step when training Context Dependant Models and Context Independant Models.
ERROR: ERROR: "gauden.c", line 1418: var (mgau= 1082, feat= 0, density=7, component=9) < 0

ERROR: ERROR: "gauden.c", line 1418: var (mgau= 1082, feat= 0, density=7, component=16) < 0

ERROR: ERROR: "gauden.c", line 1418: var (mgau= 1082, feat= 0, density=7, component=18) < 0

ERROR: ERROR: "gauden.c", line 1418: var (mgau= 1082, feat= 0, density=7, component=19) < 0

There are literally hundreds of these errors, and I have no idea what they mean..
This is after the baum welch step for interation 1 has completed (usually without error).
They only seem to occur during the FIRST iteration, and on subsequent iterations the logfiles show the following types of errors:

WARNING: "gauden.c", line 1376: (mgau= 2886, feat= 0, density= 0) never observed
WARNING: "gauden.c", line 1376: (mgau= 2887, feat= 0, density= 0) never observed

Has anyone observed similar errors and know where to start fixing them?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- roof - 2005-12-08
  
  Ah, not to worry, after going through the forums once again I found a similar thread replied to already that said that this error is caused by having a very small training set
  
  Since I have a large set available to me, I intend to try it over the whole lot, but can anyone confirm that these issues are indeed cleared up by including over 10 hours of audio data?
  
  I am a little worried about, taking the days/weeks to train the whole lot only to find the error still persists!
  
  Cheers!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Normalization error when training

Speech Recognition Toolkit

Forums

Help

Normalization error when training document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Normalization error when training