Menu

question about length audio file for trainnin

Help
2012-02-03
2012-09-22
  • nguyen duy nam

    nguyen duy nam - 2012-02-03

    hi everybody
    i read tutorial http://cmusphinx.sourceforge.net/wiki/tutorialam found:
    Optimal length is not less than 5 seconds and not more than 30 seconds.
    but i only recode once sound "a" so the audio file only 2s, i do it is true or
    false.
    thanks

     
  • Nickolay V. Shmyrev

    Do you know the meaning of the word optimal? If no please consult with the
    dictionary.

     
  • nguyen duy nam

    nguyen duy nam - 2012-02-03

    thanks, i know.
    when i trainning, run ./scripts_pl/RunAll.pl

    then terminal show:
    **
    ./scripts_pl/RunAll.pl
    MODULE: 00 verify training files
    O.S. is case sensitive ("A" != "a").
    Phones will be treated as case sensitive.
    Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
    phonelist file.
    Found 41 words using 36 phones
    Phase 2: DICT - Checking to make sure there are not duplicate entries in the
    dictionary
    Phase 3: CTL - Check general format; utterance length (must be positive);
    files exist
    Phase 4: CTL - Checking number of lines in the transcript should match lines
    in control file
    Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
    reasonable.
    Estimated Total Hours Training: 0.321669444444444
    This is a small amount of data, no comment at this time
    Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the
    dictionary
    Words in dictionary: 38
    Words in filler dictionary: 3
    WARNING: Utterance ID mismatch on line 2: mswav/aw vs b
    WARNING: Utterance ID mismatch on line 3: mswav/aa vs ba
    WARNING: Utterance ID mismatch on line 4: mswav/b vs bay
    WARNING: Utterance ID mismatch on line 5: mswav/c vs bon
    WARNING: Utterance ID mismatch on line 6: mswav/d vs c
    WARNING: Utterance ID mismatch on line 7: mswav/dd vs chin
    WARNING: Utterance ID mismatch on line 8: mswav/e vs d
    WARNING: Utterance ID mismatch on line 9: mswav/ee vs e
    WARNING: Bad line in transcript:
    HAI (hai) I
    WARNING: Utterance ID mismatch on line 12: mswav/i vs
    WARNING: Utterance ID mismatch on line 13: mswav/k vs i
    WARNING: Utterance ID mismatch on line 14: mswav/l vs k
    WARNING: Utterance ID mismatch on line 15: mswav/m vs khong
    WARNING: Utterance ID mismatch on line 16: mswav/n vs l
    WARNING: Utterance ID mismatch on line 17: mswav/o vs m
    WARNING: Utterance ID mismatch on line 18: mswav/oo vs mot
    WARNING: Utterance ID mismatch on line 19: mswav/ow vs nam
    WARNING: Utterance ID mismatch on line 20: mswav/p vs o
    WARNING: Utterance ID mismatch on line 21: mswav/q vs p
    WARNING: Utterance ID mismatch on line 22: mswav/r vs q
    WARNING: Utterance ID mismatch on line 23: mswav/s vs r
    WARNING: Utterance ID mismatch on line 24: mswav/t vs s
    WARNING: Utterance ID mismatch on line 25: mswav/u vs sau
    WARNING: Utterance ID mismatch on line 26: mswav/uw vs t
    WARNING: Utterance ID mismatch on line 27: mswav/v vs tam
    WARNING: Utterance ID mismatch on line 28: mswav/y vs u
    WARNING: Utterance ID mismatch on line 29: so/mot vs v
    WARNING: Utterance ID mismatch on line 30: so/hai vs x
    WARNING: Utterance ID mismatch on line 31: so/ba vs y
    WARNING: Utterance ID mismatch on line 32: so/bon vs aa
    WARNING: Utterance ID mismatch on line 33: so/nam vs ee
    WARNING: Utterance ID mismatch on line 34: so/sau vs oo
    WARNING: Utterance ID mismatch on line 35: so/bay vs aw
    WARNING: Utterance ID mismatch on line 36: so/tam vs dd
    WARNING: Utterance ID mismatch on line 37: so/chin vs ow
    WARNING: Utterance ID mismatch on line 38: so/khong vs uw
    Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
    the phonelist, and all phones in the phonelist appear at least once
    WARNING: This phone (SIL) occurs in the phonelist
    (/Application/Data/speechtotext/demo/datatest2/etc/words.phone), but not in
    any word in the transcription
    (/Application/Data/speechtotext/demo/datatest2/etc/words_train.transcription)
    Something failed: (/Application/Data/speechtotext/demo/datatest2/scripts_pl/00
    .verify/verify_all.pl)

    **
    can you show me where wrong. thanks

     
  • nguyen duy nam

    nguyen duy nam - 2012-02-03

    forget: words.fileids
    file

    mswav/a
    mswav/aw
    mswav/aa
    mswav/b
    mswav/c
    mswav/d
    mswav/dd
    mswav/e
    mswav/ee
    mswav/g
    mswav/h
    mswav/i
    mswav/k
    mswav/l
    mswav/m
    mswav/n
    mswav/o
    mswav/oo
    mswav/ow
    mswav/p
    mswav/q
    mswav/r
    mswav/s
    mswav/t
    mswav/u
    mswav/uw
    mswav/v
    mswav/y
    so/mot
    so/hai
    so/ba
    so/bon
    so/nam
    so/sau
    so/bay
    so/tam
    so/chin
    so/khong

    and file transcription

    A (a)
    B (b)
    BA (ba)
    BẢY (bay)
    BỐN (bon)
    C (c)
    CHÍN (chin)
    D (d)
    E (e)
    G (g)
    H (h)
    HAI (hai) I
    I (i)
    K (k)
    KHÔNG (khong)
    L (l)
    M (m)
    MỘT (mot)
    NĂM (nam)
    O (o)
    P (p)
    Q (q)
    R (r)
    S (s)
    SÁU (sau)
    T (t)
    TÁM (tam)
    U (u)
    V (v)
    X (x)
    Y (y)
    Â (aa)
    Ê (ee)
    Ô (oo)
    Ă (aw)
    Đ (dd)
    Ơ (ow)
    Ư (uw)

    and file dic
    **
    A A
    B B
    BA B A
    BẢY B AR Y
    BỐN B OOS N
    C C
    CHÍN CH IS N
    D D
    E E
    G G
    H H
    HAI H A I
    I I
    K K
    KHÔNG K H OO N G
    L L
    M M
    MỘT M OOJ T
    NĂM N AW M
    O O
    P P
    Q Q
    R R
    S S
    SÁU S AS U
    T T
    TÁM T AS M
    U U
    V V
    X X
    Y Y
    Â AA
    Ê EE
    Ô OO
    Ă AW
    Đ DD
    Ơ OW
    Ư UW


     
  • Nickolay V. Shmyrev

    The script tells you exactly what is wrong. Utterance ids do not match in
    transcription flie and fileids file on line 2.

    mswav/aw
    

    vs

    B   (b)
    

    You should have a transcription for file aw on the second line.

    There is incorrect format line in transcription file

    HAI (hai) I
    

    It's not correct because there is an extra letter l after utterance id.

     
  • nguyen duy nam

    nguyen duy nam - 2012-02-03

    ok, thanks i passed it. but when i run again so it have error:

    ./scripts_pl/RunAll.pl
    MODULE: 00 verify training files
    O.S. is case sensitive ("A" != "a").
    Phones will be treated as case sensitive.
    Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
    phonelist file.
    Found 41 words using 36 phones
    Phase 2: DICT - Checking to make sure there are not duplicate entries in the
    dictionary
    Phase 3: CTL - Check general format; utterance length (must be positive);
    files exist
    Phase 4: CTL - Checking number of lines in the transcript should match lines
    in control file
    Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
    reasonable.
    Estimated Total Hours Training: 0.320019444444444
    This is a small amount of data, no comment at this time
    Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the
    dictionary
    Words in dictionary: 38
    Words in filler dictionary: 3
    Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
    the phonelist, and all phones in the phonelist appear at least once
    WARNING: This phone (SIL) occurs in the phonelist
    (/Application/Data/speechtotext/demo/datatest2/etc/words.phone), but not in
    any word in the transcription
    (/Application/Data/speechtotext/demo/datatest2/etc/words_train.transcription)
    MODULE: 01 Train LDA transformation
    Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
    MODULE: 02 Train MLLT transformation
    Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
    MODULE: 05 Vector Quantization
    Skipped for continuous models
    MODULE: 10 Training Context Independent models for forced alignment and VTLN
    Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
    Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
    MODULE: 11 Force-aligning transcripts
    Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
    MODULE: 12 Force-aligning data for VTLN
    Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
    MODULE: 20 Training Context Independent models
    Phase 1: Cleaning up directories:
    accumulator...logs...qmanager...models...
    Phase 2: Flat initialize
    Phase 3: Forward-Backward
    Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
    0% 10% 50% 60% 100%
    WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
    the log file for details.
    Normalization for iteration: 1
    WARNING: This step had 0 ERROR messages and 33 WARNING messages. Please check
    the log file for details.
    Current Overall Likelihood Per Frame = 3.5063077764372
    Baum welch starting for 1 Gaussian(s), iteration: 2 (1 of 1)
    0% 10% 50% 60% 100%
    WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
    the log file for details.
    Normalization for iteration: 2
    WARNING: This step had 0 ERROR messages and 33 WARNING messages. Please check
    the log file for details.
    Current Overall Likelihood Per Frame = 4.27383926323921
    Convergence Ratio = 0.767531486802013
    Baum welch starting for 1 Gaussian(s), iteration: 3 (1 of 1)
    0% 10% 50% 60% 100%
    WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
    the log file for details.
    Normalization for iteration: 3
    WARNING: This step had 0 ERROR messages and 33 WARNING messages. Please check
    the log file for details.
    Current Overall Likelihood Per Frame = 4.98022429192671
    Convergence Ratio = 0.706385028687496
    Baum welch starting for 1 Gaussian(s), iteration: 4 (1 of 1)
    0% 10% 50% 60% 100%
    WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
    the log file for details.
    Normalization for iteration: 4
    WARNING: This step had 0 ERROR messages and 33 WARNING messages. Please check
    the log file for details.
    WARNING:
    WARNING
    : NEGATIVE CONVERGENCE RATIO AT ITER 4! CHECK BW AND NORM
    LOGFILES
    Current Overall Likelihood Per Frame = 4.89577022229552
    Training completed after 4 iterations
    MODULE: 30 Training Context Dependent models
    Phase 1: Cleaning up directories:
    accumulator...logs...qmanager...
    Phase 2: Initialization
    This step had 1 ERROR messages and 0 WARNING messages. Please check the log
    file for details.
    Phase 3: Forward-Backward
    Baum welch starting for iteration: 1 (1 of 1)
    0%
    This step had 1 ERROR messages and 0 WARNING messages. Please check the log
    file for details.
    Only 0 parts of 1 of Baum Welch were successfully completed
    Parts 1 failed to run!
    Training failed in iteration 1
    Something failed: (/Application/Data/speechtotext/demo/datatest2/scripts_pl/30
    .cd_hmm_untied/slave_convg.pl)

    **
    can you show me to solve it. thanks indeed.

     
  • nguyen duy nam

    nguyen duy nam - 2012-02-03

    pls, help me.
    it not generate files in directory model_parameters/words.cd_cont_untied
    do, logs file

    INFO: main.c(282): Reading: /Application/Data/speechtotext/demo/datatest2/mode
    l_parameters/words.cd_cont_untied/mixture_weights
    WARN: "s3io.c", line 256: Unable to open /Application/Data/speechtotext/demo/d
    atatest2/model_parameters/words.cd_cont_untied/mixture_weights for reading; No
    such file or directory
    FATAL_ERROR: "main.c", line 771: Initialization failed

    where did i wrong? pls

     
  • Nickolay V. Shmyrev

    where did i wrong? pls

    You ignored the warnings in the training tool output. They clearly hint you
    that the input data is still not properly prepared.

     
  • nguyen duy nam

    nguyen duy nam - 2012-02-04

    thanks, can you show me what did i wrong?
    with my files and steps create them :
    file words.txt

    a
    ă
    â
    b
    c
    d
    đ
    e
    ê
    g
    h
    i
    k
    l
    m
    l
    o
    ô
    ơ
    p
    q
    r
    s
    t
    u
    ư
    v
    x
    y
    một
    hai
    ba
    bốn
    năm
    sáu
    bảy
    tám
    chín
    không

    end i create file vocab:

    text2wfreq < words.txt | wfreq2vocab > words.tmp.vocab

    then i delete comment and rename words.tmp.vocab -> words.vocab
    file words.vocab

    a
    b
    ba
    bảy
    bốn
    c
    chín
    d
    e
    g
    h
    hai
    i
    k
    không
    l
    m
    một
    năm
    o
    p
    q
    r
    s
    sáu
    t
    tám
    u
    v
    x
    y
    â
    ê
    ô
    ă
    đ
    ơ
    ư

    i create file idngram

    text2idngram -vocab words.vocab -idngram words.idngram < words.txt

    i create file arpa

    idngram2lm -vocab_type 0 -idngram words.idngram -vocab words.vocab -arpa
    words.arpa

    file words.arpa

    #######################################################################

    Ronald Rosenfeld and Philip Clarkson

    Contributors includes Wen Xu, Ananlada Chotimongkol,

    David Huggins-Daines, Arthur Chan and Alan Black

    #######################################################################

    =============================================================================
    =============== This file was produced by the CMU-Cambridge ===============
    =============== Statistical Language Modeling Toolkit ===============
    =============================================================================
    This is a 3-gram language model, based on a vocabulary of 38 words,
    which begins "a", "b", "ba"...
    This is a CLOSED-vocabulary model
    (OOVs eliminated from training data and are forbidden in test data)
    Good-Turing discounting was applied.
    1-gram frequency of frequency : 1
    2-gram frequency of frequency : 1 0 0 0 0 0 0
    3-gram frequency of frequency : 1 0 0 0 0 0 0
    1-gram discounting ratios : 0.03
    2-gram discounting ratios :
    3-gram discounting ratios :
    This file is in the ARPA-standard format introduced by Doug Paul.

    p(wd3|wd1,wd2)= if(trigram exists) p_3(wd1,wd2,wd3)
    else if(bigram w1,w2 exists) bo_wt_2(w1,w2)*p(wd3|wd2)
    else p(wd3|w2)

    p(wd2|wd1)= if(bigram exists) p_2(wd1,wd2)
    else bo_wt_1(wd1)*p_1(wd2)

    All probs and back-off weights (bo_wt) are given in log10 form.

    Data formats:

    Beginning of data mark: \data\
    ngram 1=nr # number of 1-grams
    ngram 2=nr # number of 2-grams
    ngram 3=nr # number of 3-grams

    \1-grams:
    p_1 wd_1 bo_wt_1
    \2-grams:
    p_2 wd_1 wd_2 bo_wt_2
    \3-grams:
    p_3 wd_1 wd_2 wd_3

    end of data mark: \end\

    \data\
    ngram 1=38
    ngram 2=1
    ngram 3=1

    \1-grams:
    -1.5798 a 0.0000
    -1.5798 b 0.0000
    -1.5798 ba 0.0000
    -1.5798 bảy 0.0000
    -1.5798 bốn 0.0000
    -1.5798 c 0.0000
    -1.5798 chín 0.0000
    -1.5798 d 0.0000
    -1.5798 e 0.0000
    -1.5798 g 0.0000
    -1.5798 h 0.0000
    -1.5798 hai 0.0000
    -1.5798 i 0.0000
    -1.5798 k 0.0000
    -1.5798 không 0.0000
    -1.5798 l 0.0000
    -1.5798 m 0.0000
    -1.5798 một 0.0000
    -1.5798 năm 0.0000
    -1.5798 o 0.0000
    -1.5798 p 0.0000
    -1.5798 q 0.0000
    -1.5798 r 0.0000
    -1.5798 s 0.0000
    -1.5798 sáu 0.0000
    -1.5798 t 0.0000
    -1.5798 tám 0.0000
    -1.5798 u 0.0000
    -1.5798 v 0.0000
    -1.5798 x 0.0000
    -1.5798 y 0.0000
    -1.5798 â 0.0000
    -1.5798 ê 0.0000
    -1.5798 ô 0.0000
    -1.5798 ă 0.0000
    -1.5798 đ 0.0000
    -1.5798 ơ 0.0000
    -1.5798 ư -0.4771

    \2-grams:
    -0.1761 ư <unk> -0.3010 </unk>

    \3-grams:
    -0.3010 ư <unk> <unk> </unk></unk>

    \end\

    i create fild DMP

    sphinx_lm_convert -i words.arpa -o words.lm.DMP

    what did i wrong? thanks

     
  • nguyen duy nam

    nguyen duy nam - 2012-02-04

    i forget:
    file dic i make by hand

    A A
    B B
    BA B A
    BẢY B AR Y
    BỐN B OOS N
    C C
    CHÍN CH IS N
    D D
    E E
    G G
    H H
    HAI H A I
    I I
    K K
    KHÔNG K H OO N G
    L L
    M M
    MỘT M OOJ T
    NĂM N AW M
    O O
    P P
    Q Q
    R R
    S S
    SÁU S AS U
    T T
    TÁM T AS M
    U U
    V V
    X X
    Y Y
    Â AA
    Ê EE
    Ô OO
    Ă AW
    Đ DD
    Ơ OW
    Ư UW

    file phone i create by:
    http://bakuzen.com/extractphoneme.php

    A
    A
    AA
    AR
    AS
    AW
    AW
    B
    B
    C
    CH
    D
    DD
    E
    EE
    G
    H
    H
    I
    IS
    K
    K
    L
    M
    M
    N
    N
    O
    OO
    OO
    OOJ
    OOS
    OW
    P
    Q
    R
    S
    S
    T
    T
    U
    UW
    V
    X
    Y
    SIL

    file fileids i create by hand

    mswav/a
    mswav/b
    so/ba
    so/bay
    so/bon
    mswav/c
    so/chin
    mswav/d
    mswav/e
    mswav/g
    mswav/h
    so/hai
    mswav/i
    mswav/k
    so/khong
    mswav/l
    mswav/m
    so/mot
    so/nam
    mswav/o
    mswav/p
    mswav/q
    mswav/r
    mswav/s
    so/sau
    mswav/t
    so/tam
    mswav/u
    mswav/v
    mswav/x
    mswav/y
    mswav/aa
    mswav/ee
    mswav/oo
    mswav/aw
    mswav/dd
    mswav/ow
    mswav/uw

    and file transcription i create by hand

    A (a)
    B (b)
    BA (ba)
    BẢY (bay)
    BỐN (bon)
    C (c)
    CHÍN (chin)
    D (d)
    E (e)
    G (g)
    H (h)
    HAI (hai)
    I (i)
    K (k)
    KHÔNG (khong)
    L (l)
    M (m)
    MỘT (mot)
    NĂM (nam)
    O (o)
    P (p)
    Q (q)
    R (r)
    S (s)
    SÁU (sau)
    T (t)
    TÁM (tam)
    U (u)
    V (v)
    X (x)
    Y (y)
    Â (aa)
    Ê (ee)
    Ô (oo)
    Ă (aw)
    Đ (dd)
    Ơ (ow)
    Ư (uw)

    thanks

     
  • Nickolay V. Shmyrev

    NFO: main.c(282): Reading: /Application/Data/speechtotext/demo/datatest2/mod
    el_parameters/words.cd_cont_untied/mixture_weights
    WARN: "s3io.c", line 256: Unable to open /Application/Data/speechtotext/demo/d
    atatest2/model_parameters/words.cd_cont_untied/mixture_weights for reading; No
    such file or directory
    FATAL_ERROR: "main.c", line 771: Initialization failed

    where did i wrong? pls

    If you want to understand the reason of this error, you need to check all log
    files, not just the last one.

     

Log in to post a comment.