Menu

Merging HMM and train model with many states?

Help
osman b
2009-09-24
2012-09-22
  • osman b

    osman b - 2009-09-24

    Hello,
    I am using Sphinx since one year and familiar with some of its
    functionalities. First I would like to thank to all the researchers who
    contribute to develop such a toolkit.

    I have two questions;
    First I would like to train a HMM with many states (more than 40 states). I
    know that number of states are given in sphinxtrain.cfg with
    CFG_STATESPERHMM. However when I run monophone training perl script with 40
    states and two HMM's (SIL and UBM), parameters after 20th state become 0 or
    meaningless. Do you think is it normal? Or am I doing something wrong?

    My second question is about merging HMM parameters. I would like to merge
    different HMM parameters to a single HMM file to make the recognition process
    faster. I learnt the HMM parameter structure in Sphinx (I am aware of
    printp.exe and also wrote functions to convert text HMM parameters to binary
    format) I also achieved to merge HMM files and change model definition with
    respect to merged HMM. However when I run sphinx3_decode with the big, merged
    HMM (which have more than 900 monophone models and 2700 gaussian states),
    decoder gives an error and quit. The error is something like "Calloc
    failed from ...". Do you think is it normal? Or am I doing something
    wrong?

    Any suggestion will be helpfull

    Thanks in advance

     
  • Nickolay V. Shmyrev

    > However when I run monophone training perl script with 40 states and two
    HMM's (SIL and UBM), parameters after 20th state become 0 or meaningless. Do
    you think is it normal? Or am I doing something wrong?

    I'm not sure if some bug caused this. We need a way to reproduce the problem
    to say more, it might be floating point precision errors for example. In
    general, I don't see any sense to train 40 states for silence model.

    > The error is something like "Calloc failed from ...". Do you
    think is it normal? Or am I doing something wrong?

    Please learn to provide exact information about error. For example large
    number in precise error message could give a hint that you made a error in
    byte order conversion. WIth your excerpt it's impossible to say anything.

     
  • osman b

    osman b - 2009-10-05

    Thank you very much for your answer

    \> I'm not sure if some bug caused this. We need a way to reproduce the
    problem to say more, it might be floating point precision errors for example.
    In general, I don't see any sense to train 40 states for silence model.

    In fact I am not trying to train 40 states silence model but 40 state HMM for
    a specific sentence. How can I provide extra information to reproduce the
    problem?

    \> Please learn to provide exact information about error. For example large
    number in precise error message could give a hint that you made a error in
    byte order conversion. WIth your excerpt it's impossible to say anything.

    I re-run my script and note the message. It is somehing like;

    INFO: dict2pid.c(599): Building PID tables for dictionary

    calloc(753571000,4) failed from
    ......\src\libs3decoder\libsearch\dict2pid.c(611)

    I hope this information is helpfull to find the problem.

    Thanks in advance

     
  • Nickolay V. Shmyrev

    > in fact I am not trying to train 40 states silence model but 40 state HMM
    > and two HMM's (SIL and UBM)

    What's SIL then? You could have workaround that with having one phone for SIL
    words and 8 word-dependent phones for UBM. That will probably help but not
    much. In general, I'd better read about the task you are trying to accomplish
    then to discuss details without knowing the rest.

    > calloc(753571000,4) failed from
    ......\src\libs3decoder\libsearch\dict2pid.c(611)

    Well, now it's more or less clear what happens. At least much more clear then
    before with "some calloc error".

    Do you have 910 phones in a phoneset? Sphinx can't handle that, it requires
    rework to be able to be applied to such a large number of phones.

     
  • osman b

    osman b - 2009-10-15

    Thank you very much for your answer

    \> What's SIL then? You could have workaround that with having one phone
    for SIL words and 8 word-dependent phones for UBM. That will probably help but
    not much. In general, I'd better read about the task you are trying to
    accomplish then to discuss details without knowing the rest.

    I include SIL model since it is impossible to train HMM without it. But as you
    suggested I may solve this problem with word dependent models for UBM and 3
    state normal SIL model. Thank you for your suggestion.

    \> Well, now it's more or less clear what happens. At least much more clear
    then before with "some calloc error". Do you have 910 phones in a
    phoneset? Sphinx can't handle that, it requires rework to be able to be
    applied to such a large number of phones.

    Yes I have 910 phones in my phoneset. I merge different HMM’s to a single HMM
    to make recognition faster. I give incremental numbers to each merged HMM. As
    I could see from my previous experiences, Sphinx can handle many Gaussian for
    tri-phone models. (above 100.000 gaussians seem to be possible) Why such a
    problem occurs when large number of mono-phones are used instead of tri-
    phones. Is there a workaround for this problem, too?

    My .mdef looks like as below after merging;

    0.3
    910 n_base
    0 n_tri
    3640 n_state_map
    2730 n_tied_state
    2730 n_tied_ci_state
    910 n_tied_tmat

    Columns definitions

    base lft rt p attrib tmat ... state id's ...

    B_1 - - - n/a 0 0 1 2 N
    E_1 - - - n/a 1 3 4 5 N
    F_1 - - - n/a 2 6 7 8 N
    I1_1 - - - n/a 3 9 10 11 N
    M_1 - - - n/a 4 12 13 14 N
    N_1 - - - n/a 5 15 16 17 N
    R_1 - - - n/a 6 18 19 20 N
    S_1 - - - n/a 7 21 22 23 N
    S1_1 - - - n/a 8 24 25 26 N
    B_2 - - - n/a 9 27 28 29 N
    E_2 - - - n/a 10 30 31 32 N
    F_2 - - - n/a 11 33 34 35 N
    I1_2 - - - n/a 12 36 37 38 N
    M_2 - - - n/a 13 39 40 41 N
    N_2 - - - n/a 14 42 43 44 N
    R_2 - - - n/a 15 45 46 47 N
    S_2 - - - n/a 16 48 49 50 N
    S1_2 - - - n/a 17 51 52 53 N
    B_3 - - - n/a 18 54 55 56 N
    E_3 - - - n/a 19 57 58 59 N
    F_3 - - - n/a 20 60 61 62 N
    I1_3 - - - n/a 21 63 64 65 N
    M_3 - - - n/a 22 66 67 68 N
    N_3 - - - n/a 23 69 70 71 N
    R_3 - - - n/a 24 72 73 74 N
    S_3 - - - n/a 25 75 76 77 N
    S1_3 - - - n/a 26 78 79 80 N
    B_4 - - - n/a 27 81 82 83 N
    E_4 - - - n/a 28 84 85 86 N
    F_4 - - - n/a 29 87 88 89 N
    I1_4 - - - n/a 30 90 91 92 N
    M_4 - - - n/a 31 93 94 95 N
    N_4 - - - n/a 32 96 97 98 N
    R_4 - - - n/a 33 99 100 101 N
    S_4 - - - n/a 34 102 103 104 N
    S1_4 - - - n/a 35 105 106 107 N
    .
    .
    .
    and continues till 910’th monophone

    Do you think may I solve the problem if I change tmat column as below? What
    might be the effect of this change? Then I should have a transition matrix
    which have just 9 gaussians instead of 910?

    Columns definitions

    base lft rt p attrib tmat ... state id's ...

    B_1 - - - n/a 0 0 1 2 N
    E_1 - - - n/a 1 3 4 5 N
    F_1 - - - n/a 2 6 7 8 N
    I1_1 - - - n/a 3 9 10 11 N
    M_1 - - - n/a 4 12 13 14 N
    N_1 - - - n/a 5 15 16 17 N
    R_1 - - - n/a 6 18 19 20 N
    S_1 - - - n/a 7 21 22 23 N
    S1_1 - - - n/a 8 24 25 26 N
    B_2 - - - n/a 0 27 28 29 N
    E_2 - - - n/a 1 30 31 32 N
    F_2 - - - n/a 2 33 34 35 N
    I1_2 - - - n/a 3 36 37 38 N
    M_2 - - - n/a 4 39 40 41 N
    N_2 - - - n/a 5 42 43 44 N
    R_2 - - - n/a 6 45 46 47 N
    S_2 - - - n/a 7 48 49 50 N
    S1_2 - - - n/a 8 51 52 53 N
    B_3 - - - n/a 0 54 55 56 N
    E_3 - - - n/a 1 57 58 59 N
    F_3 - - - n/a 2 60 61 62 N
    I1_3 - - - n/a 3 63 64 65 N
    M_3 - - - n/a 4 66 67 68 N
    N_3 - - - n/a 5 69 70 71 N
    R_3 - - - n/a 6 72 73 74 N
    S_3 - - - n/a 7 75 76 77 N
    S1_3 - - - n/a 8 78 79 80 N
    B_4 - - - n/a 0 81 82 83 N
    E_4 - - - n/a 1 84 85 86 N
    F_4 - - - n/a 2 87 88 89 N
    I1_4 - - - n/a 3 90 91 92 N
    M_4 - - - n/a 4 93 94 95 N
    N_4 - - - n/a 5 96 97 98 N
    R_4 - - - n/a 6 99 100 101 N
    S_4 - - - n/a 7 102 103 104 N
    S1_4 - - - n/a 8 105 106 107 N
    .
    .
    .

    Thanks in advance

     
  • Nickolay V. Shmyrev

    > Why such a problem occurs when large number of mono-phones are used
    instead of tri-phones. Is there a workaround for this problem, too?

    For fast access sphinx creates an array of the size NxNxN for each gaussian.
    If you have 910 phones it will be 910x910x910 entries, which is obviously too
    much for your system. There is no way to solve this problem except to rework
    the source to use another access scheme. Hashtable for example.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.