CMU Sphinx / Forums / Help: Merging HMM and train model with many states?

osman b - 2009-09-24

Hello,
I am using Sphinx since one year and familiar with some of its
functionalities. First I would like to thank to all the researchers who
contribute to develop such a toolkit.

I have two questions;
First I would like to train a HMM with many states (more than 40 states). I
know that number of states are given in sphinxtrain.cfg with
CFG_STATESPERHMM. However when I run monophone training perl script with 40
states and two HMM's (SIL and UBM), parameters after 20th state become 0 or
meaningless. Do you think is it normal? Or am I doing something wrong?

My second question is about merging HMM parameters. I would like to merge
different HMM parameters to a single HMM file to make the recognition process
faster. I learnt the HMM parameter structure in Sphinx (I am aware of
printp.exe and also wrote functions to convert text HMM parameters to binary
format) I also achieved to merge HMM files and change model definition with
respect to merged HMM. However when I run sphinx3_decode with the big, merged
HMM (which have more than 900 monophone models and 2700 gaussian states),
decoder gives an error and quit. The error is something like "Calloc
failed from ...". Do you think is it normal? Or am I doing something
wrong?

Any suggestion will be helpfull

Thanks in advance

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2009-09-25

> However when I run monophone training perl script with 40 states and two
HMM's (SIL and UBM), parameters after 20th state become 0 or meaningless. Do
you think is it normal? Or am I doing something wrong?

I'm not sure if some bug caused this. We need a way to reproduce the problem
to say more, it might be floating point precision errors for example. In
general, I don't see any sense to train 40 states for silence model.

> The error is something like "Calloc failed from ...". Do you
think is it normal? Or am I doing something wrong?

Please learn to provide exact information about error. For example large
number in precise error message could give a hint that you made a error in
byte order conversion. WIth your excerpt it's impossible to say anything.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

osman b - 2009-10-05

Thank you very much for your answer

\> I'm not sure if some bug caused this. We need a way to reproduce the
problem to say more, it might be floating point precision errors for example.
In general, I don't see any sense to train 40 states for silence model.

In fact I am not trying to train 40 states silence model but 40 state HMM for
a specific sentence. How can I provide extra information to reproduce the
problem?

\> Please learn to provide exact information about error. For example large
number in precise error message could give a hint that you made a error in
byte order conversion. WIth your excerpt it's impossible to say anything.

I re-run my script and note the message. It is somehing like;

INFO: dict2pid.c(599): Building PID tables for dictionary

calloc(753571000,4) failed from
......\src\libs3decoder\libsearch\dict2pid.c(611)

I hope this information is helpfull to find the problem.

Thanks in advance

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2009-10-06

> in fact I am not trying to train 40 states silence model but 40 state HMM
> and two HMM's (SIL and UBM)

What's SIL then? You could have workaround that with having one phone for SIL
words and 8 word-dependent phones for UBM. That will probably help but not
much. In general, I'd better read about the task you are trying to accomplish
then to discuss details without knowing the rest.

> calloc(753571000,4) failed from
......\src\libs3decoder\libsearch\dict2pid.c(611)

Well, now it's more or less clear what happens. At least much more clear then
before with "some calloc error".

Do you have 910 phones in a phoneset? Sphinx can't handle that, it requires
rework to be able to be applied to such a large number of phones.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

osman b - 2009-10-15

Thank you very much for your answer

\> What's SIL then? You could have workaround that with having one phone
for SIL words and 8 word-dependent phones for UBM. That will probably help but
not much. In general, I'd better read about the task you are trying to
accomplish then to discuss details without knowing the rest.

I include SIL model since it is impossible to train HMM without it. But as you
suggested I may solve this problem with word dependent models for UBM and 3
state normal SIL model. Thank you for your suggestion.

\> Well, now it's more or less clear what happens. At least much more clear
then before with "some calloc error". Do you have 910 phones in a
phoneset? Sphinx can't handle that, it requires rework to be able to be
applied to such a large number of phones.

Yes I have 910 phones in my phoneset. I merge different HMM’s to a single HMM
to make recognition faster. I give incremental numbers to each merged HMM. As
I could see from my previous experiences, Sphinx can handle many Gaussian for
tri-phone models. (above 100.000 gaussians seem to be possible) Why such a
problem occurs when large number of mono-phones are used instead of tri-
phones. Is there a workaround for this problem, too?

My .mdef looks like as below after merging;

0.3
910 n_base
0 n_tri
3640 n_state_map
2730 n_tied_state
2730 n_tied_ci_state
910 n_tied_tmat

Columns definitions

base lft rt p attrib tmat ... state id's ...

B_1 - - - n/a 0 0 1 2 N
E_1 - - - n/a 1 3 4 5 N
F_1 - - - n/a 2 6 7 8 N
I1_1 - - - n/a 3 9 10 11 N
M_1 - - - n/a 4 12 13 14 N
N_1 - - - n/a 5 15 16 17 N
R_1 - - - n/a 6 18 19 20 N
S_1 - - - n/a 7 21 22 23 N
S1_1 - - - n/a 8 24 25 26 N
B_2 - - - n/a 9 27 28 29 N
E_2 - - - n/a 10 30 31 32 N
F_2 - - - n/a 11 33 34 35 N
I1_2 - - - n/a 12 36 37 38 N
M_2 - - - n/a 13 39 40 41 N
N_2 - - - n/a 14 42 43 44 N
R_2 - - - n/a 15 45 46 47 N
S_2 - - - n/a 16 48 49 50 N
S1_2 - - - n/a 17 51 52 53 N
B_3 - - - n/a 18 54 55 56 N
E_3 - - - n/a 19 57 58 59 N
F_3 - - - n/a 20 60 61 62 N
I1_3 - - - n/a 21 63 64 65 N
M_3 - - - n/a 22 66 67 68 N
N_3 - - - n/a 23 69 70 71 N
R_3 - - - n/a 24 72 73 74 N
S_3 - - - n/a 25 75 76 77 N
S1_3 - - - n/a 26 78 79 80 N
B_4 - - - n/a 27 81 82 83 N
E_4 - - - n/a 28 84 85 86 N
F_4 - - - n/a 29 87 88 89 N
I1_4 - - - n/a 30 90 91 92 N
M_4 - - - n/a 31 93 94 95 N
N_4 - - - n/a 32 96 97 98 N
R_4 - - - n/a 33 99 100 101 N
S_4 - - - n/a 34 102 103 104 N
S1_4 - - - n/a 35 105 106 107 N
.
.
.
and continues till 910’th monophone

Do you think may I solve the problem if I change tmat column as below? What
might be the effect of this change? Then I should have a transition matrix
which have just 9 gaussians instead of 910?

Columns definitions

base lft rt p attrib tmat ... state id's ...

B_1 - - - n/a 0 0 1 2 N
E_1 - - - n/a 1 3 4 5 N
F_1 - - - n/a 2 6 7 8 N
I1_1 - - - n/a 3 9 10 11 N
M_1 - - - n/a 4 12 13 14 N
N_1 - - - n/a 5 15 16 17 N
R_1 - - - n/a 6 18 19 20 N
S_1 - - - n/a 7 21 22 23 N
S1_1 - - - n/a 8 24 25 26 N
B_2 - - - n/a 0 27 28 29 N
E_2 - - - n/a 1 30 31 32 N
F_2 - - - n/a 2 33 34 35 N
I1_2 - - - n/a 3 36 37 38 N
M_2 - - - n/a 4 39 40 41 N
N_2 - - - n/a 5 42 43 44 N
R_2 - - - n/a 6 45 46 47 N
S_2 - - - n/a 7 48 49 50 N
S1_2 - - - n/a 8 51 52 53 N
B_3 - - - n/a 0 54 55 56 N
E_3 - - - n/a 1 57 58 59 N
F_3 - - - n/a 2 60 61 62 N
I1_3 - - - n/a 3 63 64 65 N
M_3 - - - n/a 4 66 67 68 N
N_3 - - - n/a 5 69 70 71 N
R_3 - - - n/a 6 72 73 74 N
S_3 - - - n/a 7 75 76 77 N
S1_3 - - - n/a 8 78 79 80 N
B_4 - - - n/a 0 81 82 83 N
E_4 - - - n/a 1 84 85 86 N
F_4 - - - n/a 2 87 88 89 N
I1_4 - - - n/a 3 90 91 92 N
M_4 - - - n/a 4 93 94 95 N
N_4 - - - n/a 5 96 97 98 N
R_4 - - - n/a 6 99 100 101 N
S_4 - - - n/a 7 102 103 104 N
S1_4 - - - n/a 8 105 106 107 N
.
.
.

Thanks in advance

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2009-10-15

> Why such a problem occurs when large number of mono-phones are used
instead of tri-phones. Is there a workaround for this problem, too?

For fast access sphinx creates an array of the size NxNxN for each gaussian.
If you have 910 phones it will be 910x910x910 entries, which is obviously too
much for your system. There is no way to solve this problem except to rework
the source to use another access scheme. Hashtable for example.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Merging HMM and train model with many states?

Speech Recognition Toolkit

Forums

Help

Merging HMM and train model with many states? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Columns definitions

base lft rt p attrib tmat ... state id's ...

Columns definitions

base lft rt p attrib tmat ... state id's ...

Merging HMM and train model with many states?