Menu

100% error after decoding with Sphinx3

Help
Long Hoang
2010-03-29
2012-09-22
  • Long Hoang

    Long Hoang - 2010-03-29

    I am using windows vista 32bit OS

    Here is my log file with a DMP file in my etc folder
    http://www.megaupload.com/?d=BYRTVZRB

    The log says i have an error with "wid.c" and it has all these errors saying
    that a word is not in the dictionary. Im thinking it has to be the DMP file
    that I have in my etc folder of my database. Pretty sure that this DMP file is
    from an4. Without the DMP file I will get 1 error and it will then give me a
    system error where it is trying to find the DMP file. I don't know how to
    create this DMP file with my dictionary.

    I went through the CMU tutorial and set up my own database (like the an4, I
    made mine to be LongDB) with what I hope to be correct files and such in
    correct directories. When i got to the training part I ran "perl
    scripts_pl\copy_setup.pl -task LongDB". Next i ran "perl
    scripts_pl/make_feats.pl -ctl etc/LongDB_test.fileids". Both cmd's ran fine
    with no errors. Then I ran "perl scripts_pl/decode/slave.pl" and i got WER and
    SE to be both 100%.

    Here is my result folder with files
    http://www.megaupload.com/?d=P49TY9H9

    I would appreciate it if someone helped me out with this problem. Thank you
    for your time.

     
  • Nickolay V. Shmyrev

    I would appreciate it if someone helped me out with this problem. Thank you
    for your time.

    To get help you need to provide more information. In particular you need to
    upload all your training folder instead of just few logs that only show the
    error.

     
  • Long Hoang

    Long Hoang - 2010-03-30

    http://www.megaupload.com/?d=PGVS6JWS
    Here is my database with everything in it. Thank you nsh.

     
  • Nickolay V. Shmyrev

    I went through the CMU tutorial and set up my own database

    There was no need to do that.

    you used sph recordings at 44 khz.

    You should use 16kHz wav files instead. You had to change the input format in
    sphinx_train.cfg:

    $CFG_WAVFILE_EXTENSION = 'wav';
    $CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw

    16kHz is mandatory though

    You used only 0.02h of data

    That's not enough. Practical database starts with 1 hours of audio

    Pretty sure that this DMP file is from an4.

    You don't need DMP file, in decode configuration you can use arpa language
    model without conversion to DMP:

    $DEC_CFG_LANGUAGEMODEL = "$DEC_CFG_LANGUAGEMODEL_DIR/LongDB.ug.lm"

    The order of lines in fileids and transcription was incorrect

    You need to have the same order of files in fileids and in transcriptions.
    Uttid in transcription should match file name in fileids.

     
  • Long Hoang

    Long Hoang - 2010-04-06

    Hello nsh,

    I have a quick question. Is it possible to add my words in the an4 dictionary
    and add my audio files into the wav folders also? This way i will be able to
    have more than 1 hour worth of audio for the program to train and decode.

     
  • Nickolay V. Shmyrev

    Is it possible to add my words in the an4 dictionary and add my audio files
    into the wav folders also?

    It's possible.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.