CMU Sphinx / Forums / Help: Error while training using sphinxtrain

tmt - 2016-01-19

I created the database for the Indian Language, Malayalam with all the necessary files as described in http://cmusphinx.sourceforge.net/wiki/tutorialam .

After creation of this database i ran the command "sphinxtrain -t setup sample"
following is the warning obtained::

thanky@thanky-HP-15-Notebook-PC:~/mproj/db/sample$ sphinxtrain -t setup sample
Sphinxtrain path: /usr/local/lib/sphinxtrain
Sphinxtrain binaries path: /usr/local/libexec/sphinxtrain
Running the training
MODULE: 000 Computing feature from audio files
Extracting features from segments starting at (part 1 of 1)
Extracting features from segments starting at (part 1 of 1)
Feature extraction is done
MODULE: 00 verify training files
Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file.
Found 71 words using 46 phones
Phase 2: Checking to make sure there are not duplicate entries in the dictionary
Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
Phase 4: Checking number of lines in the transcript file should match lines in fileids file
Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: 0.0420722222222222
ERROR: Not enough data for the training, we can only train CI models (set CFG_CD_TRAIN to "no")
Phase 6: Checking that all the words in the transcript are in the dictionary
Words in dictionary: 68
Words in filler dictionary: 3
WARNING: Bad line in transcript:
~~<sil> AVASARAMORUKKAN <sil> PRADHANAMANTHRIYODUM <sil> KENDRTHIRANJEDUPPU <sil> </sil></sil></sil></sil>~~ (a12)
WARNING: Utterance ID mismatch on line 13: speaker_1/a12 vs
WARNING: Bad line in transcript:
<s<sil> COMMISIONODUM <sil> AAVASHYAPEDUMENNU <sil> MANTHRI <sil> ARIYICHU <sil> (a13)
WARNING: Utterance ID mismatch on line 14: speaker_1/a13 vs
WARNING: Bad line in transcript:
~~<sil> AVASARAMORUKKAN PRADHANAMANTHRIYODUM KENDRA THIRANJEDUPPU COMMISIONODUM AAVASHYAPEDUMENNU MANTHRI ARIYICHU <sil> </sil></sil>~~ (b9)
WARNING: Utterance ID mismatch on line 23: speaker_2/b9 vs
Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
WARNING: This phone (dhh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)
WARNING: This phone (hh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)
WARNING: This phone (oh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)</sil></sil></sil></sil></sil>

The phone "dhh, hh ,oh " etc are used in the .dic file. This phonemes wont be present in transcript files right? As, we write the normal words in the transcript files rather than writing the phonetic transcription. So how can I solve the above warning as it describes these particular phonemes are not present in the transcription files?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-01-24
  
  WARNING: Bad line in transcript:
  <sil> AVASARAMORUKKAN <sil> PRADHANAMANTHRIYODUM <sil> KENDRTHIRANJEDUPPU <sil> (a12)
  WARNING: Utterance ID mismatch on line 13: speaker_1/a12 vs </sil></sil></sil></sil>
  
  You need to fix this error first
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2016-01-24
    
    So how can I solve the above warning as it describes these particular phonemes are not present in the transcription files?
    
    Make sure that words with those phonemes are present in transcripts. Due to earlier errors such words might be excluded.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tmt - 2016-02-15

Thank you.
I have solved those issues. But the .html file created has strikes over all text. I have attached the screenshort of the .html file. Is this an indication to some error??

I was trying to build the acoustic model for the indian language, Malayalam. I was following the steps described in http://cmusphinx.sourceforge.net/wiki/tutorialam . The creation of acoustic model was successful while on trying with an4 database. But when I did the same steps in my own database, creation of required folders:

model_parameters
model_architecture
result

Was not successful. Does this have a connection with the above mentioned error??

Screenshot from 2016-02-15 14:23:31.png

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-15
  
  You do not have enough data, add more data for training
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tmt - 2016-02-16

I made the training data to 5 hr, and on training the following required folders were created.
model_parameters
model_architecture
result

But still the strikes of text in .html is not resolved and also in the end of training some errors popped up. Error:

Training for 8 Gaussian(s) completed after 7 iterations
MODULE: 60 Lattice Generation
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 61 Lattice Pruning
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 62 Lattice Format Conversion
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 65 MMIE Training
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 90 deleted interpolation
Skipped for continuous models
MODULE: DECODE Decoding using models previously trained
Decoding 14 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
Can't open /home/thanky/mproj/db/sample/result/sample-1-1.match
word_align.pl failed with error code 65280 at /usr/local/lib/sphinxtrain/scripts/decode/slave.pl line 173.

How to solve this?

Last edit: tmt 2016-02-16

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-16
  
  You can find details in logdir/decode folder in a log file.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - tmt - 2016-02-18
    
    pocketsphinx_batch: error while loading shared libraries: libpocketsphinx.so.3: cannot open shared object file: No such file or directory
    
    Last edit: tmt 2016-02-18
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2016-02-18
      
      You need to configure linker with LD_LIBRARY_PATH or with /etc/ld.so.conf to load libraries from /usr/local
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - tmt - 2016-02-18
        
        how to do that? I had checked this link http://stackoverflow.com/questions/4743233/is-usr-local-lib-searched-for-shared-libraries
        
        But I was not able to follow. Please help.
        
        Last edit: tmt 2016-02-18
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2016-02-18
        
        Ask Google
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- abdelkbir - 2020-05-19
  
  PLease I have the same problem
  (MODULE: DECODE Decoding using models previously trained
  Decoding 0 segments starting at 0 (part 1 of 1)
  Aligning results to find error rate
  word_align.pl failed with error code 65280 at C:\ProjectSphinx\sphinxtrain\scripts\decode\slave.pl line 173.)
  
  How you solve that
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tmt - 2016-02-26

On running "Sphinxtrain -s decode run" ,
the result obtained was ::

MODULE: DECODE Decoding using models previously trained
Decoding 14 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
SENTENCE ERROR: 14.3% (2/14) WORD ERROR RATE: 14.0% (8/57)

I assume that the work till this step is correct.As a next step, I have to perform live recognition . What are the steps to be followed for live recognition?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-27
  
  This question is answered in the following section of our tutorial
  
  http://cmusphinx.sourceforge.net/wiki/tutorialam#using_the_model
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tmt - 2016-03-01

Thank you..Now,I am able to do live recognition.
Right now I was doing the recognition of Malayalam language by doing the transcription in english for every Malayalam words I used. Is it possible to use sphinx to work with unicode characters, ie, to write my text corpus,dictionary, transcription files everything in malayalam unicode format ? If it is possible please do help me.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-03-01
  
  Is it possible to use sphinx to work with unicode characters, ie, to write my text corpus,dictionary, transcription files everything in malayalam unicode format ?
  
  Yes, you can use utf-8 encoding.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tmt - 2016-03-02

Can you please suggest a tutorial to work with utf-8 encoding?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

pannam - 2017-01-18

it is simple. I also use it. just replace english alphabets with utf-8. its that simple

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Error while training using sphinxtrain

Speech Recognition Toolkit

Forums

Help

Error while training using sphinxtrain document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Error while training using sphinxtrain