CMU Sphinx / Forums / Speech Recognition Theory: MMIE, LDA-MLLT

creative64 - 2010-07-26

Hi,

Have few basic questions for some commonly seen terms in SpeechRec/Sphinx.
Couple of lines explaination or pointer to
a link describing these will be enough.

Are MMIE and MLE basically different ways to arrive at HMM parameters ? Does decoder need to know if the acoustic
model was trained using MMIE or MLE algorithms ?

What exactly is LDA/MLLT feature translation ? Does decoder need to know if the acoustic model was trained using
LDA/MLLT ? Do these go together I mean is there anything like only LDA or only
MLLT transformation ?

Can all of these techniques be applied to semicontinuous acoustic models ?

Which of the above techniques were used to train hub4wsj_sc_8k model ?

What does 8k in hub4wsj_sc_8k signify ? Is it the number of senones or the sampling frequency used to create the model ?

Thanks and regrads,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-07-26

Are MMIE and MLE basically different ways to arrive at HMM parameters ?

They are different ways to estimate the parameters

Does decoder need to know if the acoustic model was trained using MMIE or
MLE algorithms ?

No

What exactly is LDA/MLLT feature translation?

Feature vector is multiplied on matrix. In LDA case this matrix is choosen to
reduce dimension of the feature vector by selection
of main components. In MLLT case additional property of diagonal covariance is
improved.

Does decoder need to know if the acoustic model was trained using LDA/MLLT?

Yes

Do these go together I mean is there anything like only LDA or only MLLT
transformation ?

There can be only LDA or only MLLT, but in practice it's single matrix which
is a multiplication of MLLT matrix on LDA matrix.

Can all of these techniques be applied to semicontinuous acoustic models
?

No

Which of the above techniques were used to train hub4wsj_sc_8k model?

MMIE probably, I'm not sure. As semicontinuous model it doesn't use feature
space transformation.

What does 8k in hub4wsj_sc_8k signify ? Is it the number of senones or
the sampling frequency used to create the model ?

Sample rate. See in feat.params -upperf 4000 which means that 8khz audio could
be decoded with this model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-07-26

Thanks a lot NS.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-07-27

Hi NS,

Few more question as an afterthought:

What is the data bit-width of data (8 bit or 16 bit) used to create hub4wsj_sc_8k model ? For optimum decoding accuracy, the bit width of "data to be decoded" should match with the with of "training data" right ? Is this value specified somewhere in the model definition files ?

For training an acoustic model using sphinxtrain, where exactly do I need to specify the parameters (say in order to get a feat.params exactly like that of hub4wsj_sc_8k ) ? I have 16 bit audio recorded at 16 Khz.

Thanks and regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-07-28

What is the data bit-width of data (8 bit or 16 bit) used to create
hub4wsj_sc_8k model ? For optimum decoding accuracy, the bit width of "data to
be decoded" should match with the with of "training data" right? Is this value
specified somewhere in the model definition files?

The bit width must always be 16. There is no such configuration because 8 is
just not supported.

For training an acoustic model using sphinxtrain, where exactly do I
need to specify the parameters (say in order to get a feat.params exactly like
that of hub4wsj_sc_8k ) ? I have 16 bit audio recorded at 16 Khz.

Right now you need to edit ./scripts_pl/make_feats.pl. The following section

$params = <<"EOP"; -alpha 0.97 -dither yes -doublebw no -nfilt 40 -ncep 13 -lowerf 133.33334 -upperf 6855.4976 -nfft 512 -wlen 0.0256
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-07-28

Got it. Thanks NS.

Regards,

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-07-30

Hi NS,

I'm trying to train a semicont acoustic model for pocketsphinx with exactly
the same parameters as for hub4wsj_sc_8k. Looking at
./scripts_pl/make_feats.pl and ./etc/sphinx_train.cfg have following
questions:

Where do I specify
-transform dct
-round_filters no
-remove_dc yes
-svspec 0-12/13-25/26-38
-cmninit 56,-3,1

What do -svspec and -cmninit specify ?

An acoustic model generated from default an4 settings has parameters like
-alpha 0.97
-dither yes
-doublebw no
-ncep 13
which are not seen in hub4wsj_sc_8k parameters. Why ?

What is the meaning of streams ? Does 1s_c_d_dd specify one stream ?

Is 1s_c_d_dd interpreted as 1 stream, cepstrun coefficients, delta and double delta ?

Feature vector length in this case will be 39 right ?

How to interpret s2_4x ?

Thanks and Regards,

PS:
FYI..

hub4wsj_sc_8k parameters...
-nfilt 20
-lowerf 1
-upperf 4000
-wlen 0.025
-transform dct
-round_filters no
-remove_dc yes
-svspec 0-12/13-25/26-38
-feat 1s_c_d_dd
-agc none
-cmn current
-cmninit 56,-3,1
-varnorm no

Default an4.cd_semi_1000 parameters...
-alpha 0.97
-dither yes
-doublebw no
-nfilt 40
-ncep 13
-lowerf 133.33334
-upperf 6855.4976
-nfft 512
-wlen 0.0256
-transform legacy
-feat s2_4x
-agc none
-cmn current
-varnorm no
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-07-30

Where do I specify -transform dct -round_filters no -remove_dc yes

In make_feats.pl

-svspec 0-12/13-25/26-38

in sphinx_train.cfg configuration variable CFG_SVSPEC

-cmninit 56,-3,1

In feat.params after training

What do -svspec and -cmninit specify ?

svspec - specification for subvector quantization, specify which features to
put in each stream

cmninit - initial value for live CMN. CMN values are printed for each
utterance. In order to guess value for CMN quickly, initial CMN value should
be close to the average cepstral mean value.

An acoustic model generated from default an4 settings has parameters
like -alpha 0.97 -dither yes -doublebw no -ncep 13 which are not seen in
hub4wsj_sc_8k parameters. Why ?

They are default no need to specify them

What is the meaning of streams ?

Each stream is modelled with own gaussian distribution, so if parts of feature
vector are in theory independant, there is sense to use streams. You can find
more information in a textbook.

Does 1s_c_d_dd specify one stream ?

Yes

Is 1s_c_d_dd interpreted as 1 stream, cepstrun coefficients, delta and
double delta ?

Yes, but it's only abbreviation. 1s_c_dd has no meaning for example

Feature vector length in this case will be 39 right ?

Yes

How to interpret s2_4x ?

4 streams, 51 coefficient.

Cepstrum without c0 (12 coefficients)

Deltas with step 2 + Deltas with step 4 (24 coefficients)

c0, delta c0, delta-delta c0 (3 coefficients)

Delta-delta without c0 (12 coefficients)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-07-31

Thanks NS,

Am I setting these parameters correctly in sphinx_train.cfg (for ?

$CFG_VECTOR_LENGTH = 39

$CFG_FEATURE = "1s_c_d_dd";
$CFG_NUM_STREAMS = 4; ----- or should it be 1 ?
$CFG_INITIAL_NUM_DENSITIES = 256;
$CFG_FINAL_NUM_DENSITIES = 256;

Regards,

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-08-02

Hi NS,

Please ignore the above question. I've some more and have clubbed them
together:

Parameters "remove_dc", "transform", "round_filters" and "cmninit" seem to be applicable only in decoding phase (not able to find them anywhere in SphixTrain directory hierarchy !!!). Is this understanding correct ?

If yes then can they all be put in feat.PARAMS after the training is done (you already mentioned that cmninit needs to be put
after the training) ?

Do following setting look OK in sphinx_train.cfg (for a hub4wsj_sc_8k like semicont model for pocketsphinx) ?
$CFG_VECTOR_LENGTH = 39
$CFG_FEATURE = "1s_c_d_dd";
$CFG_NUM_STREAMS = 4; ----- or should it be 1 ?
$CFG_INITIAL_NUM_DENSITIES = 256;
$CFG_FINAL_NUM_DENSITIES = 256;

Thanks and regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-08-02

Parameters "remove_dc", "transform", "round_filters" and "cmninit" seem
to be applicable only in decoding phase (not able to find them anywhere in
SphixTrain directory hierarchy !!!). Is this understanding correct?

Please don't ask me questions you can answer yourself.

If yes then can they all be put in feat.PARAMS after the training is
done (you already mentioned that cmninit needs to be put after the training) ?

Yes, I already mentioned that

Do following setting look OK in sphinx_train.cfg (for a hub4wsj_sc_8k
like semicont model for pocketsphinx) ? $CFG_VECTOR_LENGTH = 39 $CFG_FEATURE =
"1s_c_d_dd"; $CFG_NUM_STREAMS = 4; ----- or should it be 1 ?
$CFG_INITIAL_NUM_DENSITIES = 256; $CFG_FINAL_NUM_DENSITIES = 256; Thanks and
regards,

No idea, why don't you just try and see if it works.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-08-03

Please don't ask me questions you can answer yourself.

Wanted to be doubly sure (Grepping in Win7 is having a weired bahavior)...
will be careful from next time. Sorry about this.

When I set

$CFG_VECTOR_LENGTH = 13;
$CFG_FEATURE = "1s_c_d_dd";
$CFG_NUM_STREAMS = 1;
$CFG_INITIAL_NUM_DENSITIES = 256;
$CFG_FINAL_NUM_DENSITIES = 256

My training goes file however when I set $CFG_VECTOR_LENGTH = 39 (Which I
think should be the case because
$CFG_FEATURE = "1s_c_d_dd" ) I get a message
"Expected vector length of 39, got 26" and the training aborts.

In either of the above mentioned cases cases when I set $CFG_SVSPEC = 0-12/13-25/26-38 training aborts and I get following
message in the logfile:

ERROR: "........\src\libs\libcommon\cmd_ln.c", line 525: Expecting 'C:\User
s\Amit\my_data\amit\Technology\Speech\Sphinx\PocketSphinx\hub4wsj_type_local_m
odel\an4\bin\bw.exe -switch_1 <arg_1> -switch_2 <arg_2> ...' </arg_2></arg_1>

I also see a value of "-svspec -39.8846153846154" in this logfile as if
0-12/13-25/26-38 is evaluated as matematical expression and put for svspec.

Any hints, comments, suggestions (Note: As I mentioned earlier my aim is to
have a model with parameter compatibility to hub4wsj_sc_8k) ?

Note: I might be doing something fundamentally wrong here but am not able to
figure out.

Thanks and regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-08-03

Then I set $CFG_VECTOR_LENGTH = 13; $CFG_FEATURE = "1s_c_d_dd";
$CFG_NUM_STREAMS = 1; $CFG_INITIAL_NUM_DENSITIES = 256;
$CFG_FINAL_NUM_DENSITIES = 256 My training goes file however when I set
$CFG_VECTOR_LENGTH = 39 (Which I think should be the case because $CFG_FEATURE
= "1s_c_d_dd" ) I get a message "Expected vector length of 39, got 26" and the
training aborts.

Vector length is the length of cepstrum vector, not feature vector. It should
be 13, not 39.

cases cases when I set $CFG_SVSPEC = 0-12/13-25/26-38

You need quotes, don't you

$CFG_SVSPEC = "0-12/13-25/26-38";
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-08-04

$CFG_SVSPEC = "0-12/13-25/26-38";

Thanks NS. Due to my fixation with vector length and limited knowledge of
perl, I totally missed it.

With the all changes discussed above I'm able to train the model now. One last
hitch remains it seems.

When I try to run this model (the one with same parameters as hub4wsj_sc_8k ), the decoder crashes when I run
a decoding session.

If I just change "-transform dct" to "-transform legacy" in feat.params, decoding works perfectly fine with excellent
accuracy.

PS:
pocketsphinx 0.6, sphinxtrain nightly build (dated 12 July 2010).
I have data for 4 speakers (100 short utterance each) totalling about 0.27 hrs. 16Khz, mono, 16 bit.
WIth this data I had earlier trained an AN4 like model (model with parameters similar to the one for default AN4). That also
works perfectly fine. But if I change the transform to dct in this model (not
sure if it is permissible to do this however...),
this model too start crashing like the other one.
----- As if their is an issue with dct transform.
hub4wsj_sc_8k decodes perfectly fine without any issues.

Any clues !

Thanks and regards,

Amit.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-08-08

When I try to run this model (the one with same parameters as
hub4wsj_sc_8k ), the decoder crashes when I run a decoding session.

You need to provide details of the crash if you need help on this. You need to
provide backtrace at least.

If I just change "-transform dct" to "-transform legacy" in feat.params,
decoding works perfectly fine with excellent accuracy.

-transform must be in scripts_pl/make_feats.pl from the early feature extraction. It seems you forgot to put it there and trained your model with legacy transform instead of dct one.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-08-09

Hi NS,

I've put my scripts_pl/make_feats.pl at following link. I'm putting -transform dct there. Is there something else wrong with the
script ? http://www.mediafire.com/file/8go1e96680df646/make_feats.pl

Do round_filters and remove_dc too need to match for training and decoding ?

You need to provide details of the crash
Had tried doing some tracing. The failure seems to happen in fsg_search.c
(line nos 1062 to 1069) in
the while loop "while (frm == last_frm)", "fl =
fsg_hist_entry_fsglink(hist_entry)" fl becomes a NULL pointer and line
"if ((!final) || fsg_link_to_state(fl) == fsg_model_final_state(fsg))" results
in error due to fl becoming a NULL pointer.This
happens when "bpidx" goes to 0.

But I suspect the fundamental problem is somewhere in make_feats.pl file.

Another observation (though it might be irrelevant here) is that if I run this decoding session with "hub4wsj_sc_8k" model,
decoding works perfect. When I change the transform to "htk" in feat.params
(of hub4wsj_sc_8k) , the decoding still works perfect
but when I change it to "legacy", it gives me more than 100% WER (doesn't
crash though !).

Regards,

Amit.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-08-10

I've put my scripts_pl/make_feats.pl at following link. I'm putting
-transform dct there.

Transform option should go together with upperf option, lowerf option and
nfilt option. You placed it incorrectly

Do round_filters and remove_dc too need to match for training and
decoding

Yes

results in error due to fl becoming a NULL pointer.

K, this issue now is fixed in trunk

When I change the transform to "htk" in feat.params (of hub4wsj_sc_8k) , the
decoding still works perfect but when I change it to "legacy", it gives me
more than 100% WER (doesn't crash though !).

dct and htk are actually identical. So this goes as it should be
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-08-11

Hi NS,

Transform option should go together with upperf option, lowerf option and
nfilt option. You placed it incorrectly

When I put transform option together with options that you listed above I get
an error "ERROR: "........\src\libs\libcommon\cmd_ln.c", line 551: Unknown
switch -transform seen", "ERROR: "........\src\libs\libcommon\cmd_ln.c",
line 525: Expecting 'bin/wave2feat -switch_1 <arg_1> -switch_2 <arg_2> ...' </arg_2></arg_1>

This had puzzled me earlier also, parameters like "-transform, -remove_dc and
round_filters" don't seem to be valid arguments for
wave2feat whereas they are valid for sphinx_fe (from sphinxbase).
make_feats.pl tries to run wave2feat.

What might be missing in my setup ?

Regards,

PS: Is pocketsphinxbase also needed for training ? Currently I'm having only
an4 and SphinxTrain in my training setup.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-08-11

This had puzzled me earlier also, parameters like "-transform, -remove_dc
and round_filters" don't seem to be valid arguments for wave2feat whereas they
are valid for sphinx_fe (from sphinxbase). make_feats.pl tries to run
wave2feat.

Yes, you need to run sphinx_fe from sphinxbase instead wave2feat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2010-08-11

Yes, you need to run sphinx_fe from sphinxbase instead wave2feat

Thanks NS. This was the missing link that caused all doubts.... Everything is
working fine now.

I'm observing that when I train the model with dct, "Current overall
likelyhood per frame comes to the order of -58 however
when I train with legay it comes to the order of +15. WER is good (with
training set) in both the cases. Just curious to know
what does this value actually signify and is it related to perfectness of the
model !

Regards,

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

MMIE, LDA-MLLT

Speech Recognition Toolkit

Forums

Help

MMIE, LDA-MLLT

MMIE, LDA-MLLT

Speech Recognition Toolkit

Forums

Help

MMIE, LDA-MLLT document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

MMIE, LDA-MLLT