The log says i have an error with "wid.c" and it has all these errors saying
that a word is not in the dictionary. Im thinking it has to be the DMP file
that I have in my etc folder of my database. Pretty sure that this DMP file is
from an4. Without the DMP file I will get 1 error and it will then give me a
system error where it is trying to find the DMP file. I don't know how to
create this DMP file with my dictionary.
I went through the CMU tutorial and set up my own database (like the an4, I
made mine to be LongDB) with what I hope to be correct files and such in
correct directories. When i got to the training part I ran "perl
scripts_pl\copy_setup.pl -task LongDB". Next i ran "perl
scripts_pl/make_feats.pl -ctl etc/LongDB_test.fileids". Both cmd's ran fine
with no errors. Then I ran "perl scripts_pl/decode/slave.pl" and i got WER and
SE to be both 100%.
I would appreciate it if someone helped me out with this problem. Thank you
for your time.
To get help you need to provide more information. In particular you need to
upload all your training folder instead of just few logs that only show the
error.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a quick question. Is it possible to add my words in the an4 dictionary
and add my audio files into the wav folders also? This way i will be able to
have more than 1 hour worth of audio for the program to train and decode.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am using windows vista 32bit OS
Here is my log file with a DMP file in my etc folder
http://www.megaupload.com/?d=BYRTVZRB
The log says i have an error with "wid.c" and it has all these errors saying
that a word is not in the dictionary. Im thinking it has to be the DMP file
that I have in my etc folder of my database. Pretty sure that this DMP file is
from an4. Without the DMP file I will get 1 error and it will then give me a
system error where it is trying to find the DMP file. I don't know how to
create this DMP file with my dictionary.
I went through the CMU tutorial and set up my own database (like the an4, I
made mine to be LongDB) with what I hope to be correct files and such in
correct directories. When i got to the training part I ran "perl
scripts_pl\copy_setup.pl -task LongDB". Next i ran "perl
scripts_pl/make_feats.pl -ctl etc/LongDB_test.fileids". Both cmd's ran fine
with no errors. Then I ran "perl scripts_pl/decode/slave.pl" and i got WER and
SE to be both 100%.
Here is my result folder with files
http://www.megaupload.com/?d=P49TY9H9
I would appreciate it if someone helped me out with this problem. Thank you
for your time.
To get help you need to provide more information. In particular you need to
upload all your training folder instead of just few logs that only show the
error.
http://www.megaupload.com/?d=PGVS6JWS
Here is my database with everything in it. Thank you nsh.
There was no need to do that.
You should use 16kHz wav files instead. You had to change the input format in
sphinx_train.cfg:
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
16kHz is mandatory though
That's not enough. Practical database starts with 1 hours of audio
You don't need DMP file, in decode configuration you can use arpa language
model without conversion to DMP:
$DEC_CFG_LANGUAGEMODEL = "$DEC_CFG_LANGUAGEMODEL_DIR/LongDB.ug.lm"
You need to have the same order of files in fileids and in transcriptions.
Uttid in transcription should match file name in fileids.
Hello nsh,
I have a quick question. Is it possible to add my words in the an4 dictionary
and add my audio files into the wav folders also? This way i will be able to
have more than 1 hour worth of audio for the program to train and decode.
It's possible.