Menu

Sphinxtrain problems

Help
sandy
2009-10-01
2012-09-22
  • sandy

    sandy - 2009-10-01

    hi everyone,
    I have so many doubts following the robust group tutorial, actually i want to
    train for telugu language. I have prepared the telugu dictionary, filler
    dictionary and phoneme set as mentioned in tutorial.

    1) when I run the script "perl scripts_pl/make_feats.pl -ctl
    etc/an4_train.fileids" I could not find any assistance asking for me to
    speak. At what instance it will ask me to speak and record my voice.

    2)I have to trian for phonemes or for words?

    Iam just the beginner, can you please help me friends...

    Thank you,

    Sandhya,
    mumbai,India.

     
  • Nickolay V. Shmyrev

    > 1) when I run the script "perl scriptspl/makefeats.pl -ctl
    etc/an4_train.fileids" I could not find any assistance asking for me to
    speak. At what instance it will ask me to speak and record my voice.

    The audio files are recorded beforehand and placed into a wav folder. You can
    find files from an4 there already. Add your ones if you need. Don't forget to
    update etc/train.fileids and etc/train.transcriptions after that.

    > 2)I have to trian for phonemes or for words?

    I don't understand this question. You are just training the model, you aren't
    training it for something or with something. If you are asking if you
    dictionary must be phone-based or word-based, it depends on the size of your
    vocabulary.

     
  • sandy

    sandy - 2009-10-02

    Thank you sir for your reply..
    Now I have completed the acoustic model training and next is to build the
    language model. I used the LMtool to build the laguage model. Presently I want
    to use it in sphinx3 and it requires .lm file to be converted in to .dmp
    format .

    My doubts are:

    1) I have seen in a forum that lm_convert is the tool to convert to DMP
    format. How to use that ?? And what is command to convert .lm file to .dmp
    file?? And where (at what folder) I have to be inorder to execute that
    command??

    2) After the first step what I have to do??

    Waiting for you reply sir..

     
  • sandy

    sandy - 2009-10-02

    hello nickolay,

    // 1) I have seen in a forum that lm_convert is the tool to convert to DMP
    format. How to use that ?? And what is command to convert .lm file to .dmp
    file?? And where (at what folder) I have to be inorder to execute that
    command?? solved............................................... by using
    lm3g2dmp

    ** Presently I have acoustic model and language model......
    1) how to integrate this acoustic and language model to form a speech
    recognition system...???

     
  • sandy

    sandy - 2009-10-06

    Hello nickolay I have some doubts regarding the training data, I have searched
    through the forums but couldn't understand well. sorry If it was answered
    before...

    1) I have the dictionary with 100 words, How many times I have to train the
    each word to get the accurate model..

    2) And what is the minimum Database to be trained to recognise speech?

     
  • Nickolay V. Shmyrev

    > 1) I have the dictionary with 100 words, How many times I have to train
    the each word to get the accurate model..

    Nobody will tell you this because basically such estimate doesn't exists. You
    should avoid even such words like "accurate model". Learn to use
    quantitative estimates instead

    Also learn by exaple, the similar task is covered by RM1 resource management
    database. Find it's description here:

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S3B

    > 2) And what is the minimum Database to be trained to recognise speech?

    Again, there is no such thing. If you are interested in dictation, you can
    find out a description of dictation databases

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC94S13A

     
  • sandy

    sandy - 2009-10-06

    Thank you sir for helping me till now and clarifying my doubts...

    While following the robust tutorial for AN4 database the .wav files are
    converted to .sph format whereas in quickstart subwiki manual they are using
    .raw format...

    1) which format I have to follow ? (presently iam following the robust
    tutorial and AN4 database..)

     
  • Nickolay V. Shmyrev

    The an4 uses sph format because it an4 originally was available in this
    format. There is no need to convert audio from wav to sph, you can just record
    your database in wav files and use configuration in etc/sphinx_train.cfg to
    point the format of the files. On the top of the files there are settings to
    choose MSWAV instead of nist. Also you need to change extension from sph to
    wav in configuration file.

     
  • sandy

    sandy - 2009-10-07

    1) what is the reason for this?

    Phones will be treated as case sensitive.
    Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
    phonelist file.
    Found 195 words using 29 phones
    Phase 2: DICT - Checking to make sure there are not duplicate entries in the
    dictionary
    Phase 3: CTL - Check general format; utterance length (must be positive);
    files exist
    WARNING: CTL line does not parse correctly:

    Phase 4: CTL - Checking number of lines in the transcript should match lines
    in control file
    Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
    reasonable.
    Total Hours Training: 0.074827564102564
    This is a small amount of data, no comment at this time
    Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the
    dictionary
    Words in dictionary: 192
    Words in filler dictionary: 3
    Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
    the phonelist, and all phones in the phonelist appear at least once
    Something failed:
    (/home/taruns/Desktop/tutorial/telugu/scripts_pl/00.verify/verify_all.pl)

     
  • Nickolay V. Shmyrev

    > WARNING: CTL line does not parse correctly:

    This line tells you that your your etc/<db>_train.fileids has empty
    line. You should remove it.

     
  • sandy

    sandy - 2009-10-07

    In the process of decoding, I have recorded 2 sentences and followed the
    commands in robust group tutorial and the error is.......

    MODULE: DECODE Decoding using models previously trained
    Decoding 3 segments starting at 0 (part 1 of 1)
    0%
    This step had 3 ERROR messages and 4 WARNING messages. Please check the log
    file for details.
    Aligning results to find error rate
    word_align.pl failed with error code 65280 at scripts_pl/decode/slave.pl line
    173.

    How can I overcome this problem?

     
  • sandy

    sandy - 2009-10-07

    previous problem **solved***

     
  • sandy

    sandy - 2009-10-07

    sorry nickolay if my questions are very simple to solve.......
    My questions:

    1) In my test- transcript file, I have given the same audio files what I have
    trained before and iam expecting 100% correct as the audio files are same in
    both the cases. But my error rate is 75%... why it was happened like this???

    Thanks for being patient in answering my questions....

     
  • sandy

    sandy - 2009-10-07

    Can I know about these decoding errors:

    1) ERROR: "cont_mgau.c", line 666: Weight normalization failed for 3
    senones

    2)ERROR: "vithist.c", line 818: No word exit in frame 323, using
    exits from frame 122

    3)ERROR: "vithist.c", line 818: No word exit in frame 179, using
    exits from frame 74

     
  • Nickolay V. Shmyrev

    > In my test- transcript file, I have given the same audio files what I
    have trained before and iam expecting 100% correct as the audio files are same
    in both the cases. But my error rate is 75%... why it was happened like
    this???

    It's expected. Accuracy is never 100%, that's the basic rule you need to learn

    > 1) ERROR: "cont_mgau.c", line 666: Weight normalization failed
    for 3 senone

    You have too many senones for the small amount of training data

    > 3)ERROR: "vithist.c", line 818: No word exit in frame 179,
    using exits from frame 74

    Your transcription doesn't match audio recorded.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.