Menu

Building the acoustic model process

Help
clbin
2011-06-01
2012-09-22
  • clbin

    clbin - 2011-06-01

    First, prepare the data:
    1.etc/6965.dic:

    0   Z IY R OW
    1   W AH N
    2   T UW
    3   TH R IY
    4   F OW R
    5   F AY V
    6   S IH K S
    7   S EH V AH N
    8   EY T
    9   N AY N
    

    2.etc/6965.filler:

    <s>             SIL
    </s>            SIL
    <sil>           SIL
    
    1. etc/6965.lm
      4.etc/6965.lm.dmp
      5.etc/6965.phone:

      AH
      AY
      EH
      EY
      F
      IH
      IY
      K
      N
      OW
      R
      S
      SIL
      T
      TH
      UW
      V
      W
      Z

    2. etc/6965_train.fileids:

      gen_fest_0001
      gen_fest_0002
      gen_fest_0003
      gen_fest_0004
      gen_fest_0005
      gen_fest_0006
      gen_fest_0007
      gen_fest_0008
      gen_fest_0009
      gen_fest_0010

    7.etc/6965_train.transcription:

    <s> 0 </s> (gen_fest_0001)
    <s> 1 </s> (gen_fest_0002)
    <s> 2 </s> (gen_fest_0003)
    <s> 3 </s> (gen_fest_0004)
    <s> 4 </s> (gen_fest_0005)
    <s> 5 </s> (gen_fest_0006)
    <s> 6 </s> (gen_fest_0007)
    <s> 7 </s> (gen_fest_0008)
    <s> 8 </s> (gen_fest_0009)
    <s> 9 </s> (gen_fest_0010)
    

    8.wav/gen_fest_0001....10

    Started command:
    1. perl ../pocketsphinx/scripts/setup_sphinx.pl -task 6965
    perl ../sphinxtrain/scripts_pl/setup_SphinxTrain.pl -task 6965

    6965 get the following folder structure:

      bin
      bwaccumdir 
      etc
      feat
      logdir
      model_parameters
      model_architecture   
      scripts_pl
      wav
    

    2.Copy this folder from Sphinxtrain manually.
    3.update sphinx_train.cfg:

    # Configuration script for sphinx trainer                  -*-mode:Perl-*-
    
    $CFG_VERBOSE = 1;       # Determines how much goes to the screen.
    
    # These are filled in at configuration time
    $CFG_DB_NAME = "6965";
    $CFG_BASE_DIR = "/home/king/cmuclmtk/6965";
    $CFG_SPHINXTRAIN_DIR = "../sphinxtrain";
    
    # Directory containing SphinxTrain binaries
    $CFG_BIN_DIR = "$CFG_BASE_DIR/bin";
    $CFG_GIF_DIR = "$CFG_BASE_DIR/gifs";
    $CFG_SCRIPT_DIR = "$CFG_BASE_DIR/scripts_pl";
    
    # Experiment name, will be used to name model files and log files
    $CFG_EXPTNAME = "$CFG_DB_NAME";
    
    # Audio waveform and feature file information
    $CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
    $CFG_WAVFILE_EXTENSION = 'wav';
    $CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
    $CFG_FEATFILES_DIR = "$CFG_BASE_DIR/feat";
    $CFG_FEATFILE_EXTENSION = 'mfc';
    $CFG_VECTOR_LENGTH = 13;
    
    $CFG_MIN_ITERATIONS = 1;  # BW Iterate at least this many times
    $CFG_MAX_ITERATIONS = 10; # BW Don't iterate more than this, somethings likely wrong.
    
    # (none/max) Type of AGC to apply to input files
    $CFG_AGC = 'none';
    # (current/none) Type of cepstral mean subtraction/normalization
    # to apply to input files
    $CFG_CMN = 'current';
    # (yes/no) Normalize variance of input files to 1.0
    $CFG_VARNORM = 'no';
    # (yes/no) Use letter-to-sound rules to guess pronunciations of
    # unknown words (English, 40-phone specific)
    $CFG_LTSOOV = 'no';
    # (yes/no) Train full covariance matrices
    $CFG_FULLVAR = 'no';
    # (yes/no) Use diagonals only of full covariance matrices for
    # Forward-Backward evaluation (recommended if CFG_FULLVAR is yes)
    $CFG_DIAGFULL = 'no';
    
    # (yes/no) Perform vocal tract length normalization in training.  This
    # will result in a "normalized" model which requires VTLN to be done
    # during decoding as well.
    $CFG_VTLN = 'no';
    # Starting warp factor for VTLN
    $CFG_VTLN_START = 0.80;
    # Ending warp factor for VTLN
    $CFG_VTLN_END = 1.40;
    # Step size of warping factors
    $CFG_VTLN_STEP = 0.05;
    
    # Directory to write queue manager logs to
    $CFG_QMGR_DIR = "$CFG_BASE_DIR/qmanager";
    # Directory to write training logs to
    $CFG_LOG_DIR = "$CFG_BASE_DIR/logdir";
    # Directory for re-estimation counts
    $CFG_BWACCUM_DIR = "$CFG_BASE_DIR/bwaccumdir";
    # Directory to write model parameter files to
    $CFG_MODEL_DIR = "$CFG_BASE_DIR/model_parameters";
    
    # Directory containing transcripts and control files for
    # speaker-adaptive training
    $CFG_LIST_DIR = "$CFG_BASE_DIR/etc";
    
    # Decoding variables for MMIE training
    $CFG_LANGUAGEWEIGHT = "11.5";
    $CFG_BEAMWIDTH      = "1e-100";
    $CFG_WORDBEAM       = "1e-80";
    $CFG_LANGUAGEMODEL  = "$CFG_LIST_DIR/$CFG_DB_NAME.lm.DMP";
    $CFG_WORDPENALTY    = "0.2";
    
    # Lattice pruning variables
    $CFG_ABEAM              = "1e-50";
    $CFG_NBEAM              = "1e-10";
    $CFG_PRUNED_DENLAT_DIR  = "$CFG_BASE_DIR/pruned_denlat";
    
    # MMIE training related variables
    $CFG_MMIE = "no";
    $CFG_MMIE_MAX_ITERATIONS = 5;
    $CFG_LATTICE_DIR = "$CFG_BASE_DIR/lattice";
    $CFG_MMIE_TYPE   = "rand"; # Valid values are "rand", "best" or "ci"
    $CFG_MMIE_CONSTE = "3.0";
    $CFG_NUMLAT_DIR  = "$CFG_BASE_DIR/numlat";
    $CFG_DENLAT_DIR  = "$CFG_BASE_DIR/denlat";
    
    # Variables used in main training of models
    $CFG_DICTIONARY     = "$CFG_LIST_DIR/$CFG_DB_NAME.dic";
    $CFG_RAWPHONEFILE   = "$CFG_LIST_DIR/$CFG_DB_NAME.phone";
    $CFG_FILLERDICT     = "$CFG_LIST_DIR/$CFG_DB_NAME.filler";
    $CFG_LISTOFFILES    = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids";
    $CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription";
    $CFG_FEATPARAMS     = "$CFG_LIST_DIR/feat.params";
    
    # Variables used in characterizing models
    
    $CFG_HMM_TYPE = '.cont.'; # Sphinx III
    #$CFG_HMM_TYPE  = '.semi.'; # PocketSphinx and Sphinx II
    #$CFG_HMM_TYPE  = '.ptm.'; # PocketSphinx (larger data sets)
    
    if (($CFG_HMM_TYPE ne ".semi.")
        and ($CFG_HMM_TYPE ne ".ptm.")
        and ($CFG_HMM_TYPE ne ".cont.")) {
      die "Please choose one CFG_HMM_TYPE out of '.cont.', '.ptm.', or '.semi.', " .
        "currently $CFG_HMM_TYPE\n";
    }
    
    # This configuration is fastest and best for most acoustic models in
    # PocketSphinx and Sphinx-III.  See below for Sphinx-II.
    $CFG_STATESPERHMM = 3;
    $CFG_SKIPSTATE = 'no';
    
    if ($CFG_HMM_TYPE eq '.semi.') {
      $CFG_DIRLABEL = 'semi';
    # Four stream features for PocketSphinx
      $CFG_FEATURE = "s2_4x";
      $CFG_NUM_STREAMS = 4;
      $CFG_INITIAL_NUM_DENSITIES = 256;
      $CFG_FINAL_NUM_DENSITIES = 256;
      die "For semi continuous models, the initial and final models have the same density" 
        if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
    } elsif ($CFG_HMM_TYPE eq '.ptm.') {
      $CFG_DIRLABEL = 'ptm';
    # Four stream features for PocketSphinx
      $CFG_FEATURE = "s2_4x";
      $CFG_NUM_STREAMS = 4;
      $CFG_INITIAL_NUM_DENSITIES = 64;
      $CFG_FINAL_NUM_DENSITIES = 64;
      die "For phonetically tied models, the initial and final models have the same density" 
        if ($CFG_INITIAL_NUM_DENSITIES != $CFG_FINAL_NUM_DENSITIES);
    } elsif ($CFG_HMM_TYPE eq '.cont.') {
      $CFG_DIRLABEL = 'cont';
    # Single stream features - Sphinx 3
      $CFG_FEATURE = "1s_c_d_dd";
      $CFG_NUM_STREAMS = 1;
      $CFG_INITIAL_NUM_DENSITIES = 1;
      $CFG_FINAL_NUM_DENSITIES = 2;
      die "The initial has to be less than the final number of densities" 
        if ($CFG_INITIAL_NUM_DENSITIES > $CFG_FINAL_NUM_DENSITIES);
    }
    
    # Number of top gaussians to score a frame. A little bit less accurate computations
    # make training significantly faster. Uncomment to apply this during the training
    # For good accuracy make sure you are using the same setting in decoder
    # In theory this can be different for various training stages. For example 4 for
    # CI stage and 16 for CD stage
    # $CFG_CI_NTOP = 4;
    # $CFG_CD_NTOP = 16;
    
    # (yes/no) Train multiple-gaussian context-independent models (useful
    # for alignment, use 'no' otherwise) in the models created
    # specifically for forced alignment
    $CFG_FALIGN_CI_MGAU = 'no';
    # (yes/no) Train multiple-gaussian context-independent models (useful
    # for alignment, use 'no' otherwise)
    $CFG_CI_MGAU = 'no';
    # Number of tied states (senones) to create in decision-tree clustering
    $CFG_N_TIED_STATES = 50;
    # How many parts to run Forward-Backward estimatinon in
    $CFG_NPART = 1;
    
    # (yes/no) Train a single decision tree for all phones (actually one
    # per state) (useful for grapheme-based models, use 'no' otherwise)
    $CFG_CROSS_PHONE_TREES = 'no';
    
    # Use force-aligned transcripts (if available) as input to training
    $CFG_FORCEDALIGN = 'no';
    
    # Use a specific set of models for force alignment.  If not defined,
    # context-independent models for the current experiment will be used.
    $CFG_FORCE_ALIGN_MDEF = "$CFG_BASE_DIR/model_architecture/$CFG_EXPTNAME.falign_ci.mdef";
    $CFG_FORCE_ALIGN_MODELDIR = "$CFG_MODEL_DIR/$CFG_EXPTNAME.falign_ci_$CFG_DIRLABEL";
    
    # Use a specific dictionary and filler dictionary for force alignment.
    # If these are not defined, a dictionary and filler dictionary will be
    # created from $CFG_DICTIONARY and $CFG_FILLERDICT, with noise words
    # removed from the filler dictionary and added to the dictionary (this
    # is because the force alignment is not very good at inserting them)
    
    # $CFG_FORCE_ALIGN_DICTIONARY = "$ST::CFG_BASE_DIR/falignout$ST::CFG_EXPTNAME.falign.dict";;
    # $CFG_FORCE_ALIGN_FILLERDICT = "$ST::CFG_BASE_DIR/falignout/$ST::CFG_EXPTNAME.falign.fdict";;
    
    # Use a particular beam width for force alignment.  The wider
    # (i.e. smaller numerically) the beam, the fewer sentences will be
    # rejected for bad alignment.
    $CFG_FORCE_ALIGN_BEAM = 1e-60;
    
    # Calculate an LDA/MLLT transform?
    $CFG_LDA_MLLT = 'no';
    # Dimensionality of LDA/MLLT output
    $CFG_LDA_DIMENSION = 29;
    
    # This is actually just a difference in log space (it doesn't make
    # sense otherwise, because different feature parameters have very
    # different likelihoods)
    $CFG_CONVERGENCE_RATIO = 0.1;
    
    # Queue::POSIX for multiple CPUs on a local machine
    # Queue::PBS to use a PBS/TORQUE queue
    $CFG_QUEUE_TYPE = "Queue";
    
    # Name of queue to use for PBS/TORQUE
    $CFG_QUEUE_NAME = "workq";
    
    # (yes/no) Build questions for decision tree clustering automatically
    $CFG_MAKE_QUESTS = "yes";
    # If CFG_MAKE_QUESTS is yes, questions are written to this file.
    # If CFG_MAKE_QUESTS is no, questions are read from this file.
    $CFG_QUESTION_SET = "${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.tree_questions";
    #$CFG_QUESTION_SET = "${CFG_BASE_DIR}/linguistic_questions";
    
    $CFG_CP_OPERATION = "${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.cpmeanvar";
    
    # This variable has to be defined, otherwise utils.pl will not load.
    $CFG_DONE = 1;
    
    return 1;
    
    1. command: perl scripts_pl/make_feats.pl -ctl etc/6965_train.fileids
      get the following folder structure feat:

      feat/gen_fest_(0001...0010).mfc

    5.command:

    sudo  perl scripts_pl/RunAll.pl
    
    1. command:

      sudo perl scripts_pl/00.verify/verify_all.pl
      sudo perl scripts_pl/10.vector_quantize/slave.VQ.pl
      sudo perl scripts_pl/20.ci_hmm/slave_convg.pl
      sudo perl scripts_pl/30.cd_hmm_untied/slave_convg.pl
      sudo perl scripts_pl/40.buildtrees/slave.treebuilder.pl
      sudo perl scripts_pl/45.prunetree/slave-state-tying.pl
      sudo perl scripts_pl/50.cd_hmm_tied/slave_convg.pl
      sudo perl scripts_pl/90.deleted_interpolation/deleted_interpolation.pl

    Now, the folder structure is as follows 6965:

    bin
    bwaccumdir
    denlat
    etc
    feat
    lattice
    logdir
    model_architecture
    model_parameters
    numlat
    pruned_denlat
    python
    qmanager
    result
    scripts_pl
    trees
    wav
    6965.html
    

    model_parameters get the following folder structure:

    6965.cd_cont_50
    6965.cd_cont_50_1
    6965.cd_cont_50_2
    6965.cd_cont_initial
    6965.cd_cont_untied
    6965.ci_cont
    6965.ci_cont_flatinitial
    6965.ci_lda
    6965.ci_lda_flatinitial
    6965.ci_semi_flatinitial
    

    OK,I applied to the android model:

            c.setString("-hmm",
                    "/sdcard/Android/data/edu.cmu.pocketsphinx/6965/model_parameters/6965.cd_cont_50");//hub4opensrc.cd_continuous_8gau
            c.setString("-dict",
                    "/sdcard/Android/data/edu.cmu.pocketsphinx/6965/etc/6965.dic");//cmu07a tidigits.dic
            c.setString("-lm",
                    "/sdcard/Android/data/edu.cmu.pocketsphinx/6965/etc/6965.lm.dmp");
    

    run program,i got:

    06-01 09:43:05.594: INFO/ActivityManager(1097): Process edu.cmu.pocketsphinx.demo (pid 2753) has died.
    

    pocketsphinx.log:

    INFO: cmd_ln.c(512): Parsing command line:
    
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -ascale     20.0        2.000000e+01
    -aw     1       1
    -backtrace  no      no
    -beam       1e-48       1.000000e-48
    -bestpath   yes     yes
    -bestpathlw 9.5     9.500000e+00
    -bghist     no      no
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -compallsen no      no
    -debug              0
    -dict               
    -dictcase   no      no
    -dither     no      no
    -doublebw   no      no
    -ds     1       1
    -fdict              
    -feat       1s_c_d_dd   1s_c_d_dd
    -featparams         
    -fillprob   1e-8        1.000000e-08
    -frate      100     100
    -fsg                
    -fsgusealtpron  yes     yes
    -fsgusefiller   yes     yes
    -fwdflat    yes     yes
    -fwdflatbeam    1e-64       1.000000e-64
    -fwdflatefwid   4       4
    -fwdflatlw  8.5     8.500000e+00
    -fwdflatsfwin   25      25
    -fwdflatwbeam   7e-29       7.000000e-29
    -fwdtree    yes     yes
    -hmm                
    -input_endian   little      little
    -jsgf               
    -kdmaxbbi   -1      -1
    -kdmaxdepth 0       0
    -kdtree             
    -latsize    5000        5000
    -lda                
    -ldadim     0       0
    -lextreedump    0       0
    -lifter     0       0
    -lm             
    -lmctl              
    -lmname     default     default
    -logbase    1.0001      1.000100e+00
    -logfn              
    -logspec    no      no
    -lowerf     133.33334   1.333333e+02
    -lpbeam     1e-40       1.000000e-40
    -lponlybeam 7e-29       7.000000e-29
    -lw     6.5     6.500000e+00
    -maxhmmpf   -1      -1
    -maxnewoov  20      20
    -maxwpf     -1      -1
    -mdef               
    -mean               
    -mfclogdir          
    -min_endfr  0       0
    -mixw               
    -mixwfloor  0.0000001   1.000000e-07
    -mllr               
    -mmap       yes     yes
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      40
    -nwpen      1.0     1.000000e+00
    -pbeam      1e-48       1.000000e-48
    -pip        1.0     1.000000e+00
    -pl_beam    1e-10       1.000000e-10
    -pl_pbeam   1e-5        1.000000e-05
    -pl_window  0       0
    -rawlogdir          
    -remove_dc  no      no
    -round_filters  yes     yes
    -samprate   16000       1.600000e+04
    -seed       -1      -1
    -sendump            
    -senlogdir          
    -senmgau            
    -silprob    0.005       5.000000e-03
    -smoothspec no      no
    -svspec             
    -tmat               
    -tmatfloor  0.0001      1.000000e-04
    -topn       4       4
    -topn_beam  0       0
    -toprule            
    -transform  legacy      legacy
    -unit_area  yes     yes
    -upperf     6855.4976   6.855498e+03
    -usewdphones    no      no
    -uw     1.0     1.000000e+00
    -var                
    -varfloor   0.0001      1.000000e-04
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wbeam      7e-29       7.000000e-29
    -wip        0.65        6.500000e-01
    -wlen       0.025625    2.562500e-02
    
    INFO: cmd_ln.c(512): Parsing command line:
    \
        -alpha 0.97 \
        -doublebw no \
        -nfilt 40 \
        -ncep 13 \
        -lowerf 133.33334 \
        -upperf 6855.4976 \
        -nfft 512 \
        -wlen 0.0256 \
        -transform legacy \
        -feat 1s_c_d_dd \
        -agc none \
        -cmn current \
        -varnorm no
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -dither     no      no
    -doublebw   no      no
    -feat       1s_c_d_dd   1s_c_d_dd
    -frate      100     100
    -input_endian   little      little
    -lda                
    -ldadim     0       0
    -lifter     0       0
    -logspec    no      no
    -lowerf     133.33334   1.333333e+02
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      40
    -remove_dc  no      no
    -round_filters  yes     yes
    -samprate   16000       1.600000e+04
    -seed       -1      -1
    -smoothspec no      no
    -svspec             
    -transform  legacy      legacy
    -unit_area  yes     yes
    -upperf     6855.4976   6.855498e+03
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wlen       0.025625    2.560000e-02
    
    INFO: acmod.c(238): Parsed model-specific feature parameters from /sdcard/Android/data/edu.cmu.pocketsphinx/6965/model_parameters/6965.cd_cont_50/feat.params
    INFO: feat.c(860): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: mdef.c(520): Reading model definition: /sdcard/Android/data/edu.cmu.pocketsphinx/6965/model_parameters/6965.cd_cont_50/mdef
    INFO: bin_mdef.c(173): Allocating 356 * 8 bytes (2 KiB) for CD tree
    INFO: tmat.c(205): Reading HMM transition probability matrices: /sdcard/Android/data/edu.cmu.pocketsphinx/6965/model_parameters/6965.cd_cont_50/transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /sdcard/Android/data/edu.cmu.pocketsphinx/6965/model_parameters/6965.cd_cont_50/means
    INFO: ms_gauden.c(292): 111 codebook, 1 feature, size: INFO: ms_gauden.c(294):  2x39INFO: ms_gauden.c(295): 
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /sdcard/Android/data/edu.cmu.pocketsphinx/6965/model_parameters/6965.cd_cont_50/variances
    INFO: ms_gauden.c(292): 111 codebook, 1 feature, size: INFO: ms_gauden.c(294):  2x39INFO: ms_gauden.c(295): 
    INFO: ms_gauden.c(356): 2485 variance values floored
    INFO: acmod.c(119): Attempting to use PTHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /sdcard/Android/data/edu.cmu.pocketsphinx/6965/model_parameters/6965.cd_cont_50/means
    INFO: ms_gauden.c(292): 111 codebook, 1 feature, size: INFO: ms_gauden.c(294):  2x39INFO: ms_gauden.c(295): 
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /sdcard/Android/data/edu.cmu.pocketsphinx/6965/model_parameters/6965.cd_cont_50/variances
    INFO: ms_gauden.c(292): 111 codebook, 1 feature, size: INFO: ms_gauden.c(294):  2x39INFO: ms_gauden.c(295): 
    INFO: ms_gauden.c(356): 2485 variance values floored
    INFO: ptm_mgau.c(670): Reading mixture weights file '/sdcard/Android/data/edu.cmu.pocketsphinx/6965/model_parameters/6965.cd_cont_50/mixture_weights'
    INFO: ptm_mgau.c(764): Read 111 x 1 x 2 mixture weights
    INFO: ptm_mgau.c(830): Maximum top-N: 4
    INFO: phone_loop_search.c(105): State beam -230231 Phone exit beam -115115 Insertion penalty 0
    INFO: dict.c(306): Allocating 4109 * 20 bytes (80 KiB) for word entries
    INFO: dict.c(321): Reading main dictionary: /sdcard/Android/data/edu.cmu.pocketsphinx/6965/etc/6965.dic
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(324): 10 words read
    INFO: dict.c(330): Reading filler dictionary: /sdcard/Android/data/edu.cmu.pocketsphinx/6965/model_parameters/6965.cd_cont_50/noisedict
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(333): 3 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(404): Allocating 19^3 * 2 bytes (13 KiB) for word-initial triphones
    INFO: dict2pid.c(131): Allocated 4408 bytes (4 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 4408 bytes (4 KiB) for single-phone word triphones
    INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
    INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(196): ngrams 1=12, 2=20, 3=10
    INFO: ngram_model_dmp.c(242):       12 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(290):       20 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(315):       10 = LM.trigrams read
    INFO: ngram_model_dmp.c(339):        3 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(358):        3 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(378):        2 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(406):        1 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(462):       12 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 10 unique initial diphones
    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 140
    INFO: ngram_search_fwdtree.c(338): after: 10 root, 12 non-root channels, 3 single-phone words
    INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
    

    Sorry, this relatively long-winded post, but I really need help. Thanks in
    advance everyone. Where is my question?

     
  • Nickolay V. Shmyrev

    I already told you what to do in the previous post: try to decode with your
    model in Linux where you trained your model and see if it works. Then decode
    on phone.

    The tutorial also explains this step which you skipped for some reason

    http://cmusphinx.sourceforge.net/wiki/tutorialam#decoding

     
  • clbin

    clbin - 2011-06-01

    hello .nshmyrev .

    Sorry, I was stupid, I started not understand "try to decode with your model
    in Linux where you trained your model and see if it works. Then decode on
    phone."
    Before I had common sense

    ./scripts_pl/decode/slave.pl
    

    but,I got:

    MODULE: DECODE Decoding using models previously trained
            Decoding 10 segments starting at 0 (part 1 of 1) 
    Could not find executable for /home/king/cmuclmtk/6965/bin/pocketsphinx_batch at /home/king/cmuclmtk/6965/scripts_pl/decode/../lib/SphinxTrain/Util.pm line 299.
            Aligning results to find error rate
    Can't open /home/king/cmuclmtk/6965/result/6965-1-1.match
    word_align.pl failed with error code 65280 at ./scripts_pl/decode/slave.pl line 173.
    

    I think this step is not necessary, so we skipped. I am a stupid look on.
    Please help me. I also try to think.
    Thinks

     
  • clbin

    clbin - 2011-06-02

    hi,
    I pocketsphinx copy pocketsphinx_batch into bin folder. After:

    ./scripts_pl/decode/slave.pl
    

    I got :

    king@ubuntu:~/cmuclmtk/6965$ sudo perl ./scripts_pl/decode/slave.pl
    MODULE: DECODE Decoding using models previously trained
            Decoding 10 segments starting at 0 (part 1 of 1) 
            0% 
    WARNING: This step had 0 ERROR messages and 1 WARNING messages.  Please check the log file for details.
            Aligning results to find error rate
            SENTENCE ERROR: 30.0% (3/10)   WORD ERROR RATE: 30.0% (3/10)
    

    I think he has a successful decoding, then use the android recognition.
    Program still died. I was not wrong there are other places?

    thinks

    6965_test.transcription:

    0 (gen_fest_0001)
    1 (gen_fest_0002)
    2 (gen_fest_0003)
    3 (gen_fest_0004)
    4 (gen_fest_0005)
    5 (gen_fest_0006)
    6 (gen_fest_0007)
    7 (gen_fest_0008)
    8 (gen_fest_0009)
    9 (gen_fest_0010)
    

    6965_test.fileids:

    gen_fest_0001
    gen_fest_0002
    gen_fest_0003
    gen_fest_0004
    gen_fest_0005
    gen_fest_0006
    gen_fest_0007
    gen_fest_0008
    gen_fest_0009
    gen_fest_0010
    

    Thanks in advance!

     
  • clbin

    clbin - 2011-06-02

    hello ,nshmyrev .
    I have carefully check my folder, but no unusual.
    How do I put my folder acoustic model sent to you.?
    thinks

     
  • clbin

    clbin - 2011-06-02

    hello ,
    I've uploaded my model to the network drive. Hope to get your help. Thank you

    [url]http://sharesend.com/8079m[/url]
    
     
  • Nickolay V. Shmyrev

    You don not have enough data to train the acoustic model. See the tutorial
    which describes which amount of data is enough

    http://cmusphinx.sourceforge.net/wiki/tutorialam

     
  • clbin

    clbin - 2011-06-09

    hello ,nshmyrev.
    Very grateful,I prepared more for the training model of the audio file, and
    then applied to the android. android no died. But he did not identify any
    information.
    I use linux:

    -lm ../../../4751/etc/4751.lm -dict ../../../4751/etc/4751.dic -hmm ../../../4751/model_parameters/4751.cd_cont_1000
    

    He can identify.

     
  • Nickolay V. Shmyrev

    For android you need to train your model using 8kHz audio. Tutorial covers
    this process in details.

     
  • clbin

    clbin - 2011-06-10

    hello I will convert wav 8khz 16bit.
    Modify feat.params:

    -alpha 0.97
    -doublebw no
    -nfilt 31
    -ncep 13
    -lowerf 133.33334
    -upperf 3500.00
    -nfft 512
    -wlen 0.0256
    -transform legacy
    -feat __CFG_FEATURE__
    -svspec __CFG_SVSPEC__
    -agc __CFG_AGC__
    -cmn __CFG_CMN__
    -varnorm __CFG_VARNORM__
    -samprate 8000.0
    -dither yes
    

    Re-generation model.
    android program still does not recognize.

                this.rec = new AudioRecord(MediaRecorder.AudioSource.DEFAULT, 8000,
                        AudioFormat.CHANNEL_IN_MONO,
                        AudioFormat.ENCODING_PCM_16BIT, 8192);
            c.setFloat("-samprate", 8000.0);
    

    thanks.

     
  • Nickolay V. Shmyrev

    Try to dump recorded audio on android before you feed it to a recognizer. Then
    try to recognizer this audio in Linux using your model. Try to share this
    audio so I can also take a look.

     
  • clbin

    clbin - 2011-06-13

    Hi Nickolay!
    I try to execute the linux command:

    pocketsphinx_continuous -lm ../../../4751/etc/4751.lm -dict ../../../4751/etc/4751.dic -hmm ../../../4751/model_parameters/4751.cd_cont_1000 -infile ../../../4751/raw/000000000.raw
    

    I got :

    ERROR: "cmd_ln.c", line 602: Unknown argument name '-infile'
    ERROR: "cmd_ln.c", line 713: Failed to parse arguments list
    

    Is "-infile" parameter does not exist?
    But I see people using the - infile in the forum a success.

    thanks

     
  • clbin

    clbin - 2011-06-13

    Hello,Nickolay!
    I put my model and the audio uploaded here. inside the audio folder in the
    raw.

    [url]http://sharesend.com/q4kva[/url]
    

    Thank you in advance,Nickolay!

     
  • Nickolay V. Shmyrev

    Is "-infile" parameter does not exist?

    Its present only in new versions. Maybe you downloaded older one

    I put my model and the audio uploaded here. inside the audio folder in the
    raw.

    And what should I do with it?

     
  • clbin

    clbin - 2011-06-13

    Hello

    Maybe you downloaded older one
    

    I use a pocketsphinx - 0.7, this is the latest version? Is not it?

    Try to share this audio so I can also take a look.
    

    How should I do? Some do not understand. Thank you for your patience

     
  • Nickolay V. Shmyrev

    Hello

    The situation as I see it now is:

    1. You trained the model but it doesn't recognize raw files. It means you trained the model incorrectly
      2.Your pocketsphinx has no -infile option. It means you are using old
      pocketsphinx. You can have it installed in parallel with new one and somehow
      old pocketsphinx is used.

    In this situation you should do the following:

    1. Find out if there is old pocketsphinx in your system which you are using instead of new one
    2. Try to train the model again using the data you have from clean folder to make sure you did everything correctly. Upload new folder again.
     
  • Nickolay V. Shmyrev

    Hm, also I see you are using TTS to build data for training. And you are
    trying to recognize real speech. I think it will not work this way. The model
    will be overtrained to recognize TTS speech and not your own one.

     
  • clbin

    clbin - 2011-06-13

    I can not correct the translation of your words. However, I can not be so kind
    to understand: the audio data using the TTS TTS only recognizes the audio
    data, if I want to recognize their own audio, have to record your own audio
    data, the data as a training model?
    thanks,Nickolay!

     
  • Nickolay V. Shmyrev

    if I want to recognize their own audio, have to record your own audio data,
    the data as a training model?

    Yes

     
  • clbin

    clbin - 2011-06-13

    OK, I'll go try a have good news soon tell you, I believe you want me to
    succeed. can soon get rid of me, ha ha. a joke

    Thanks

     
  • Nickolay V. Shmyrev

    My experience tells me you'll have more questions

     
  • clbin

    clbin - 2011-06-13

    HI,Nickolay, Tell you the good news. I was successful.This is very cool. I'm
    very excited. Although it is already late at night.
    This month, thank you very much for helping me. Hope to become transnational
    friends. Although there are some international views of China. Very grateful.
    Maybe I should not be here saying this. But really thank you. After talking to
    you, I will probably encounter more problems.

     

Log in to post a comment.