Menu

Error while training using sphinxtrain

Help
tmt
2016-01-19
2020-05-19
  • tmt

    tmt - 2016-01-19

    I created the database for the Indian Language, Malayalam with all the necessary files as described in http://cmusphinx.sourceforge.net/wiki/tutorialam .

    After creation of this database i ran the command "sphinxtrain -t setup sample"
    following is the warning obtained::

    thanky@thanky-HP-15-Notebook-PC:~/mproj/db/sample$ sphinxtrain -t setup sample
    Sphinxtrain path: /usr/local/lib/sphinxtrain
    Sphinxtrain binaries path: /usr/local/libexec/sphinxtrain
    Running the training
    MODULE: 000 Computing feature from audio files
    Extracting features from segments starting at (part 1 of 1)
    Extracting features from segments starting at (part 1 of 1)
    Feature extraction is done
    MODULE: 00 verify training files
    Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file.
    Found 71 words using 46 phones
    Phase 2: Checking to make sure there are not duplicate entries in the dictionary
    Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
    Phase 4: Checking number of lines in the transcript file should match lines in fileids file
    Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
    Estimated Total Hours Training: 0.0420722222222222
    ERROR: Not enough data for the training, we can only train CI models (set CFG_CD_TRAIN to "no")
    Phase 6: Checking that all the words in the transcript are in the dictionary
    Words in dictionary: 68
    Words in filler dictionary: 3
    WARNING: Bad line in transcript:
    <sil> AVASARAMORUKKAN <sil> PRADHANAMANTHRIYODUM <sil> KENDRTHIRANJEDUPPU <sil> (a12)
    WARNING: Utterance ID mismatch on line 13: speaker_1/a12 vs
    WARNING: Bad line in transcript:
    <s<sil> COMMISIONODUM <sil> AAVASHYAPEDUMENNU <sil> MANTHRI <sil> ARIYICHU <sil> (a13)
    WARNING: Utterance ID mismatch on line 14: speaker_1/a13 vs
    WARNING: Bad line in transcript:
    <sil> AVASARAMORUKKAN PRADHANAMANTHRIYODUM KENDRA THIRANJEDUPPU COMMISIONODUM AAVASHYAPEDUMENNU MANTHRI ARIYICHU <sil> (b9)
    WARNING: Utterance ID mismatch on line 23: speaker_2/b9 vs
    Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
    WARNING: This phone (dhh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)
    WARNING: This phone (hh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)
    WARNING: This phone (oh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)

    The phone "dhh, hh ,oh " etc are used in the .dic file. This phonemes wont be present in transcript files right? As, we write the normal words in the transcript files rather than writing the phonetic transcription. So how can I solve the above warning as it describes these particular phonemes are not present in the transcription files?

     
    • Nickolay V. Shmyrev

      WARNING: Bad line in transcript:
      <sil> AVASARAMORUKKAN <sil> PRADHANAMANTHRIYODUM <sil> KENDRTHIRANJEDUPPU <sil> (a12)
      WARNING: Utterance ID mismatch on line 13: speaker_1/a12 vs

      You need to fix this error first

       
      • Nickolay V. Shmyrev

        So how can I solve the above warning as it describes these particular phonemes are not present in the transcription files?

        Make sure that words with those phonemes are present in transcripts. Due to earlier errors such words might be excluded.

         
  • tmt

    tmt - 2016-02-15

    Thank you.
    I have solved those issues. But the .html file created has strikes over all text. I have attached the screenshort of the .html file. Is this an indication to some error??

    I was trying to build the acoustic model for the indian language, Malayalam. I was following the steps described in http://cmusphinx.sourceforge.net/wiki/tutorialam . The creation of acoustic model was successful while on trying with an4 database. But when I did the same steps in my own database, creation of required folders:

    model_parameters
    model_architecture
    result

    Was not successful. Does this have a connection with the above mentioned error??

     
    • Nickolay V. Shmyrev

      You do not have enough data, add more data for training

       
  • tmt

    tmt - 2016-02-16

    I made the training data to 5 hr, and on training the following required folders were created.
    model_parameters
    model_architecture
    result

    But still the strikes of text in .html is not resolved and also in the end of training some errors popped up. Error:

    Training for 8 Gaussian(s) completed after 7 iterations
    MODULE: 60 Lattice Generation
    Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 61 Lattice Pruning
    Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 62 Lattice Format Conversion
    Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 65 MMIE Training
    Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 90 deleted interpolation
    Skipped for continuous models
    MODULE: DECODE Decoding using models previously trained
    Decoding 14 segments starting at 0 (part 1 of 1)
    0%
    Aligning results to find error rate
    Can't open /home/thanky/mproj/db/sample/result/sample-1-1.match
    word_align.pl failed with error code 65280 at /usr/local/lib/sphinxtrain/scripts/decode/slave.pl line 173.

    How to solve this?

     

    Last edit: tmt 2016-02-16
    • Nickolay V. Shmyrev

      You can find details in logdir/decode folder in a log file.

       
      • tmt

        tmt - 2016-02-18

        pocketsphinx_batch: error while loading shared libraries: libpocketsphinx.so.3: cannot open shared object file: No such file or directory

         

        Last edit: tmt 2016-02-18
        • Nickolay V. Shmyrev

          You need to configure linker with LD_LIBRARY_PATH or with /etc/ld.so.conf to load libraries from /usr/local

           
          • tmt

            tmt - 2016-02-18

            how to do that? I had checked this link http://stackoverflow.com/questions/4743233/is-usr-local-lib-searched-for-shared-libraries

            But I was not able to follow. Please help.

             

            Last edit: tmt 2016-02-18
            • Nickolay V. Shmyrev

              Ask Google

               
    • abdelkbir

      abdelkbir - 2020-05-19

      PLease I have the same problem
      (MODULE: DECODE Decoding using models previously trained
      Decoding 0 segments starting at 0 (part 1 of 1)
      Aligning results to find error rate
      word_align.pl failed with error code 65280 at C:\ProjectSphinx\sphinxtrain\scripts\decode\slave.pl line 173.)

      How you solve that

       
  • tmt

    tmt - 2016-02-26

    On running "Sphinxtrain -s decode run" ,
    the result obtained was ::

    MODULE: DECODE Decoding using models previously trained
    Decoding 14 segments starting at 0 (part 1 of 1)
    0%
    Aligning results to find error rate
    SENTENCE ERROR: 14.3% (2/14) WORD ERROR RATE: 14.0% (8/57)

    I assume that the work till this step is correct.As a next step, I have to perform live recognition . What are the steps to be followed for live recognition?

     
  • tmt

    tmt - 2016-03-01

    Thank you..Now,I am able to do live recognition.
    Right now I was doing the recognition of Malayalam language by doing the transcription in english for every Malayalam words I used. Is it possible to use sphinx to work with unicode characters, ie, to write my text corpus,dictionary, transcription files everything in malayalam unicode format ? If it is possible please do help me.

     
    • Nickolay V. Shmyrev

      Is it possible to use sphinx to work with unicode characters, ie, to write my text corpus,dictionary, transcription files everything in malayalam unicode format ?

      Yes, you can use utf-8 encoding.

       
  • tmt

    tmt - 2016-03-02

    Can you please suggest a tutorial to work with utf-8 encoding?

     
  • pannam

    pannam - 2017-01-18

    it is simple. I also use it. just replace english alphabets with utf-8. its that simple

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.