bin
bwaccumdir
etc
feat
logdir
model_parameters
model_architecture
scripts_pl
wav
2.Copy this folder from Sphinxtrain manually.
3.update sphinx_train.cfg:
# Configuration script for sphinx trainer -*-mode:Perl-*-$CFG_VERBOSE=1;# Determines how much goes to the screen.# These are filled in at configuration time$CFG_DB_NAME="6965";$CFG_BASE_DIR="/home/king/cmuclmtk/6965";$CFG_SPHINXTRAIN_DIR="../sphinxtrain";# Directory containing SphinxTrain binaries$CFG_BIN_DIR="$CFG_BASE_DIR/bin";$CFG_GIF_DIR="$CFG_BASE_DIR/gifs";$CFG_SCRIPT_DIR="$CFG_BASE_DIR/scripts_pl";# Experiment name, will be used to name model files and log files$CFG_EXPTNAME="$CFG_DB_NAME";# Audio waveform and feature file information$CFG_WAVFILES_DIR="$CFG_BASE_DIR/wav";$CFG_WAVFILE_EXTENSION='wav';$CFG_WAVFILE_TYPE='mswav';# one of nist, mswav, raw$CFG_FEATFILES_DIR="$CFG_BASE_DIR/feat";$CFG_FEATFILE_EXTENSION='mfc';$CFG_VECTOR_LENGTH=13;$CFG_MIN_ITERATIONS=1;# BW Iterate at least this many times$CFG_MAX_ITERATIONS=10;# BW Don't iterate more than this, somethings likely wrong.# (none/max) Type of AGC to apply to input files$CFG_AGC='none';# (current/none) Type of cepstral mean subtraction/normalization# to apply to input files$CFG_CMN='current';# (yes/no) Normalize variance of input files to 1.0$CFG_VARNORM='no';# (yes/no) Use letter-to-sound rules to guess pronunciations of# unknown words (English, 40-phone specific)$CFG_LTSOOV='no';# (yes/no) Train full covariance matrices$CFG_FULLVAR='no';# (yes/no) Use diagonals only of full covariance matrices for# Forward-Backward evaluation (recommended if CFG_FULLVAR is yes)$CFG_DIAGFULL='no';# (yes/no) Perform vocal tract length normalization in training. This# will result in a "normalized" model which requires VTLN to be done# during decoding as well.$CFG_VTLN='no';# Starting warp factor for VTLN$CFG_VTLN_START=0.80;# Ending warp factor for VTLN$CFG_VTLN_END=1.40;# Step size of warping factors$CFG_VTLN_STEP=0.05;# Directory to write queue manager logs to$CFG_QMGR_DIR="$CFG_BASE_DIR/qmanager";# Directory to write training logs to$CFG_LOG_DIR="$CFG_BASE_DIR/logdir";# Directory for re-estimation counts$CFG_BWACCUM_DIR="$CFG_BASE_DIR/bwaccumdir";# Directory to write model parameter files to$CFG_MODEL_DIR="$CFG_BASE_DIR/model_parameters";# Directory containing transcripts and control files for# speaker-adaptive training$CFG_LIST_DIR="$CFG_BASE_DIR/etc";# Decoding variables for MMIE training$CFG_LANGUAGEWEIGHT="11.5";$CFG_BEAMWIDTH="1e-100";$CFG_WORDBEAM="1e-80";$CFG_LANGUAGEMODEL="$CFG_LIST_DIR/$CFG_DB_NAME.lm.DMP";$CFG_WORDPENALTY="0.2";# Lattice pruning variables$CFG_ABEAM="1e-50";$CFG_NBEAM="1e-10";$CFG_PRUNED_DENLAT_DIR="$CFG_BASE_DIR/pruned_denlat";# MMIE training related variables$CFG_MMIE="no";$CFG_MMIE_MAX_ITERATIONS=5;$CFG_LATTICE_DIR="$CFG_BASE_DIR/lattice";$CFG_MMIE_TYPE="rand";# Valid values are "rand", "best" or "ci"$CFG_MMIE_CONSTE="3.0";$CFG_NUMLAT_DIR="$CFG_BASE_DIR/numlat";$CFG_DENLAT_DIR="$CFG_BASE_DIR/denlat";# Variables used in main training of models$CFG_DICTIONARY="$CFG_LIST_DIR/$CFG_DB_NAME.dic";$CFG_RAWPHONEFILE="$CFG_LIST_DIR/$CFG_DB_NAME.phone";$CFG_FILLERDICT="$CFG_LIST_DIR/$CFG_DB_NAME.filler";$CFG_LISTOFFILES="$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids";$CFG_TRANSCRIPTFILE="$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription";$CFG_FEATPARAMS="$CFG_LIST_DIR/feat.params";# Variables used in characterizing models$CFG_HMM_TYPE='.cont.';# Sphinx III#$CFG_HMM_TYPE = '.semi.'; # PocketSphinx and Sphinx II#$CFG_HMM_TYPE = '.ptm.'; # PocketSphinx (larger data sets)if(($CFG_HMM_TYPEne".semi.")and($CFG_HMM_TYPEne".ptm.")and($CFG_HMM_TYPEne".cont.")){die"Please choose one CFG_HMM_TYPE out of '.cont.', '.ptm.', or '.semi.', "."currently $CFG_HMM_TYPE\n";}# This configuration is fastest and best for most acoustic models in# PocketSphinx and Sphinx-III. See below for Sphinx-II.$CFG_STATESPERHMM=3;$CFG_SKIPSTATE='no';if($CFG_HMM_TYPEeq'.semi.'){$CFG_DIRLABEL='semi';# Four stream features for PocketSphinx$CFG_FEATURE="s2_4x";$CFG_NUM_STREAMS=4;$CFG_INITIAL_NUM_DENSITIES=256;$CFG_FINAL_NUM_DENSITIES=256;die"For semi continuous models, the initial and final models have the same density"if($CFG_INITIAL_NUM_DENSITIES!=$CFG_FINAL_NUM_DENSITIES);}elsif($CFG_HMM_TYPEeq'.ptm.'){$CFG_DIRLABEL='ptm';# Four stream features for PocketSphinx$CFG_FEATURE="s2_4x";$CFG_NUM_STREAMS=4;$CFG_INITIAL_NUM_DENSITIES=64;$CFG_FINAL_NUM_DENSITIES=64;die"For phonetically tied models, the initial and final models have the same density"if($CFG_INITIAL_NUM_DENSITIES!=$CFG_FINAL_NUM_DENSITIES);}elsif($CFG_HMM_TYPEeq'.cont.'){$CFG_DIRLABEL='cont';# Single stream features - Sphinx 3$CFG_FEATURE="1s_c_d_dd";$CFG_NUM_STREAMS=1;$CFG_INITIAL_NUM_DENSITIES=1;$CFG_FINAL_NUM_DENSITIES=2;die"The initial has to be less than the final number of densities"if($CFG_INITIAL_NUM_DENSITIES>$CFG_FINAL_NUM_DENSITIES);}# Number of top gaussians to score a frame. A little bit less accurate computations# make training significantly faster. Uncomment to apply this during the training# For good accuracy make sure you are using the same setting in decoder# In theory this can be different for various training stages. For example 4 for# CI stage and 16 for CD stage# $CFG_CI_NTOP = 4;# $CFG_CD_NTOP = 16;# (yes/no) Train multiple-gaussian context-independent models (useful# for alignment, use 'no' otherwise) in the models created# specifically for forced alignment$CFG_FALIGN_CI_MGAU='no';# (yes/no) Train multiple-gaussian context-independent models (useful# for alignment, use 'no' otherwise)$CFG_CI_MGAU='no';# Number of tied states (senones) to create in decision-tree clustering$CFG_N_TIED_STATES=50;# How many parts to run Forward-Backward estimatinon in$CFG_NPART=1;# (yes/no) Train a single decision tree for all phones (actually one# per state) (useful for grapheme-based models, use 'no' otherwise)$CFG_CROSS_PHONE_TREES='no';# Use force-aligned transcripts (if available) as input to training$CFG_FORCEDALIGN='no';# Use a specific set of models for force alignment. If not defined,# context-independent models for the current experiment will be used.$CFG_FORCE_ALIGN_MDEF="$CFG_BASE_DIR/model_architecture/$CFG_EXPTNAME.falign_ci.mdef";$CFG_FORCE_ALIGN_MODELDIR="$CFG_MODEL_DIR/$CFG_EXPTNAME.falign_ci_$CFG_DIRLABEL";# Use a specific dictionary and filler dictionary for force alignment.# If these are not defined, a dictionary and filler dictionary will be# created from $CFG_DICTIONARY and $CFG_FILLERDICT, with noise words# removed from the filler dictionary and added to the dictionary (this# is because the force alignment is not very good at inserting them)# $CFG_FORCE_ALIGN_DICTIONARY = "$ST::CFG_BASE_DIR/falignout$ST::CFG_EXPTNAME.falign.dict";;# $CFG_FORCE_ALIGN_FILLERDICT = "$ST::CFG_BASE_DIR/falignout/$ST::CFG_EXPTNAME.falign.fdict";;# Use a particular beam width for force alignment. The wider# (i.e. smaller numerically) the beam, the fewer sentences will be# rejected for bad alignment.$CFG_FORCE_ALIGN_BEAM=1e-60;# Calculate an LDA/MLLT transform?$CFG_LDA_MLLT='no';# Dimensionality of LDA/MLLT output$CFG_LDA_DIMENSION=29;# This is actually just a difference in log space (it doesn't make# sense otherwise, because different feature parameters have very# different likelihoods)$CFG_CONVERGENCE_RATIO=0.1;# Queue::POSIX for multiple CPUs on a local machine# Queue::PBS to use a PBS/TORQUE queue$CFG_QUEUE_TYPE="Queue";# Name of queue to use for PBS/TORQUE$CFG_QUEUE_NAME="workq";# (yes/no) Build questions for decision tree clustering automatically$CFG_MAKE_QUESTS="yes";# If CFG_MAKE_QUESTS is yes, questions are written to this file.# If CFG_MAKE_QUESTS is no, questions are read from this file.$CFG_QUESTION_SET="${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.tree_questions";#$CFG_QUESTION_SET = "${CFG_BASE_DIR}/linguistic_questions";$CFG_CP_OPERATION="${CFG_BASE_DIR}/model_architecture/${CFG_EXPTNAME}.cpmeanvar";# This variable has to be defined, otherwise utils.pl will not load.$CFG_DONE=1;return1;
command: perl scripts_pl/make_feats.pl -ctl etc/6965_train.fileids
get the following folder structure feat:
I already told you what to do in the previous post: try to decode with your
model in Linux where you trained your model and see if it works. Then decode
on phone.
The tutorial also explains this step which you skipped for some reason
Sorry, I was stupid, I started not understand "try to decode with your model
in Linux where you trained your model and see if it works. Then decode on
phone."
Before I had common sense
./scripts_pl/decode/slave.pl
but,I got:
MODULE:DECODEDecodingusingmodelspreviouslytrainedDecoding10segmentsstartingat0(part1of1)Couldnotfindexecutablefor/home/king/cmuclmtk/6965/bin/pocketsphinx_batch at /home/king/cmuclmtk/6965/scripts_pl/decode/../lib/SphinxTrain/Util.pmline299.AligningresultstofinderrorrateCan'topen/home/king/cmuclmtk/6965/result/6965-1-1.matchword_align.plfailedwitherrorcode65280at./scripts_pl/decode/slave.plline173.
I think this step is not necessary, so we skipped. I am a stupid look on.
Please help me. I also try to think.
Thinks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hello ,nshmyrev.
Very grateful,I prepared more for the training model of the audio file, and
then applied to the android. android no died. But he did not identify any
information.
I use linux:
Try to dump recorded audio on android before you feed it to a recognizer. Then
try to recognizer this audio in Linux using your model. Try to share this
audio so I can also take a look.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You trained the model but it doesn't recognize raw files. It means you trained the model incorrectly
2.Your pocketsphinx has no -infile option. It means you are using old
pocketsphinx. You can have it installed in parallel with new one and somehow
old pocketsphinx is used.
In this situation you should do the following:
Find out if there is old pocketsphinx in your system which you are using instead of new one
Try to train the model again using the data you have from clean folder to make sure you did everything correctly. Upload new folder again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hm, also I see you are using TTS to build data for training. And you are
trying to recognize real speech. I think it will not work this way. The model
will be overtrained to recognize TTS speech and not your own one.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I can not correct the translation of your words. However, I can not be so kind
to understand: the audio data using the TTS TTS only recognizes the audio
data, if I want to recognize their own audio, have to record your own audio
data, the data as a training model?
thanks,Nickolay!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
HI,Nickolay, Tell you the good news. I was successful.This is very cool. I'm
very excited. Although it is already late at night.
This month, thank you very much for helping me. Hope to become transnational
friends. Although there are some international views of China. Very grateful.
Maybe I should not be here saying this. But really thank you. After talking to
you, I will probably encounter more problems.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
First, prepare the data:
1.etc/6965.dic:
2.etc/6965.filler:
etc/6965.lm
4.etc/6965.lm.dmp
5.etc/6965.phone:
AH
AY
EH
EY
F
IH
IY
K
N
OW
R
S
SIL
T
TH
UW
V
W
Z
etc/6965_train.fileids:
gen_fest_0001
gen_fest_0002
gen_fest_0003
gen_fest_0004
gen_fest_0005
gen_fest_0006
gen_fest_0007
gen_fest_0008
gen_fest_0009
gen_fest_0010
7.etc/6965_train.transcription:
8.wav/gen_fest_0001....10
Started command:
1. perl ../pocketsphinx/scripts/setup_sphinx.pl -task 6965
perl ../sphinxtrain/scripts_pl/setup_SphinxTrain.pl -task 6965
6965 get the following folder structure:
2.Copy this folder from Sphinxtrain manually.
3.update sphinx_train.cfg:
command: perl scripts_pl/make_feats.pl -ctl etc/6965_train.fileids
get the following folder structure feat:
feat/gen_fest_(0001...0010).mfc
5.command:
command:
sudo perl scripts_pl/00.verify/verify_all.pl
sudo perl scripts_pl/10.vector_quantize/slave.VQ.pl
sudo perl scripts_pl/20.ci_hmm/slave_convg.pl
sudo perl scripts_pl/30.cd_hmm_untied/slave_convg.pl
sudo perl scripts_pl/40.buildtrees/slave.treebuilder.pl
sudo perl scripts_pl/45.prunetree/slave-state-tying.pl
sudo perl scripts_pl/50.cd_hmm_tied/slave_convg.pl
sudo perl scripts_pl/90.deleted_interpolation/deleted_interpolation.pl
Now, the folder structure is as follows 6965:
model_parameters get the following folder structure:
OK,I applied to the android model:
run program,i got:
pocketsphinx.log:
Sorry, this relatively long-winded post, but I really need help. Thanks in
advance everyone. Where is my question?
I already told you what to do in the previous post: try to decode with your
model in Linux where you trained your model and see if it works. Then decode
on phone.
The tutorial also explains this step which you skipped for some reason
http://cmusphinx.sourceforge.net/wiki/tutorialam#decoding
hello .nshmyrev .
Sorry, I was stupid, I started not understand "try to decode with your model
in Linux where you trained your model and see if it works. Then decode on
phone."
Before I had common sense
but,I got:
I think this step is not necessary, so we skipped. I am a stupid look on.
Please help me. I also try to think.
Thinks
hi,
I pocketsphinx copy pocketsphinx_batch into bin folder. After:
I got :
I think he has a successful decoding, then use the android recognition.
Program still died. I was not wrong there are other places?
thinks
6965_test.transcription:
6965_test.fileids:
Thanks in advance!
see also http://cmusphinx.sourceforge.net/wiki/tutorialam#troubleshooting
hello ,nshmyrev .
I have carefully check my folder, but no unusual.
How do I put my folder acoustic model sent to you.?
thinks
hello ,
I've uploaded my model to the network drive. Hope to get your help. Thank you
You don not have enough data to train the acoustic model. See the tutorial
which describes which amount of data is enough
http://cmusphinx.sourceforge.net/wiki/tutorialam
hello ,nshmyrev.
Very grateful,I prepared more for the training model of the audio file, and
then applied to the android. android no died. But he did not identify any
information.
I use linux:
He can identify.
For android you need to train your model using 8kHz audio. Tutorial covers
this process in details.
hello I will convert wav 8khz 16bit.
Modify feat.params:
Re-generation model.
android program still does not recognize.
thanks.
Try to dump recorded audio on android before you feed it to a recognizer. Then
try to recognizer this audio in Linux using your model. Try to share this
audio so I can also take a look.
Hi Nickolay!
I try to execute the linux command:
I got :
Is "-infile" parameter does not exist?
But I see people using the - infile in the forum a success.
thanks
Hello,Nickolay!
I put my model and the audio uploaded here. inside the audio folder in the
raw.
Thank you in advance,Nickolay!
Its present only in new versions. Maybe you downloaded older one
And what should I do with it?
Hello
I use a pocketsphinx - 0.7, this is the latest version? Is not it?
How should I do? Some do not understand. Thank you for your patience
Hello
The situation as I see it now is:
2.Your pocketsphinx has no -infile option. It means you are using old
pocketsphinx. You can have it installed in parallel with new one and somehow
old pocketsphinx is used.
In this situation you should do the following:
Hm, also I see you are using TTS to build data for training. And you are
trying to recognize real speech. I think it will not work this way. The model
will be overtrained to recognize TTS speech and not your own one.
I can not correct the translation of your words. However, I can not be so kind
to understand: the audio data using the TTS TTS only recognizes the audio
data, if I want to recognize their own audio, have to record your own audio
data, the data as a training model?
thanks,Nickolay!
Yes
OK, I'll go try a have good news soon tell you, I believe you want me to
succeed. can soon get rid of me, ha ha. a joke
Thanks
My experience tells me you'll have more questions
HI,Nickolay, Tell you the good news. I was successful.This is very cool. I'm
very excited. Although it is already late at night.
This month, thank you very much for helping me. Hope to become transnational
friends. Although there are some international views of China. Very grateful.
Maybe I should not be here saying this. But really thank you. After talking to
you, I will probably encounter more problems.