100% error after decoding with Sphinx3

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

100% error after decoding with Sphinx3

Forum: Help

Creator: Long Hoang

Created: 2010-03-29

Updated: 2012-09-22

Long Hoang - 2010-03-29

I am using windows vista 32bit OS

Here is my log file with a DMP file in my etc folder
http://www.megaupload.com/?d=BYRTVZRB

The log says i have an error with "wid.c" and it has all these errors saying
that a word is not in the dictionary. Im thinking it has to be the DMP file
that I have in my etc folder of my database. Pretty sure that this DMP file is
from an4. Without the DMP file I will get 1 error and it will then give me a
system error where it is trying to find the DMP file. I don't know how to
create this DMP file with my dictionary.

I went through the CMU tutorial and set up my own database (like the an4, I
made mine to be LongDB) with what I hope to be correct files and such in
correct directories. When i got to the training part I ran "perl
scripts_pl\copy_setup.pl -task LongDB". Next i ran "perl
scripts_pl/make_feats.pl -ctl etc/LongDB_test.fileids". Both cmd's ran fine
with no errors. Then I ran "perl scripts_pl/decode/slave.pl" and i got WER and
SE to be both 100%.

Here is my result folder with files
http://www.megaupload.com/?d=P49TY9H9

I would appreciate it if someone helped me out with this problem. Thank you
for your time.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-03-29

I would appreciate it if someone helped me out with this problem. Thank you
for your time.

To get help you need to provide more information. In particular you need to
upload all your training folder instead of just few logs that only show the
error.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Long Hoang - 2010-03-30

http://www.megaupload.com/?d=PGVS6JWS
Here is my database with everything in it. Thank you nsh.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-03-30

I went through the CMU tutorial and set up my own database

There was no need to do that.

you used sph recordings at 44 khz.

You should use 16kHz wav files instead. You had to change the input format in
sphinx_train.cfg:

$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw

16kHz is mandatory though

You used only 0.02h of data

That's not enough. Practical database starts with 1 hours of audio

Pretty sure that this DMP file is from an4.

You don't need DMP file, in decode configuration you can use arpa language
model without conversion to DMP:

$DEC_CFG_LANGUAGEMODEL = "$DEC_CFG_LANGUAGEMODEL_DIR/LongDB.ug.lm"

The order of lines in fileids and transcription was incorrect

You need to have the same order of files in fileids and in transcriptions.
Uttid in transcription should match file name in fileids.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Long Hoang - 2010-04-06

Hello nsh,

I have a quick question. Is it possible to add my words in the an4 dictionary
and add my audio files into the wav folders also? This way i will be able to
have more than 1 hour worth of audio for the program to train and decode.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-04-06

Is it possible to add my words in the an4 dictionary and add my audio files
into the wav folders also?

It's possible.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.