After creation of this database i ran the command "sphinxtrain -t setup sample"
following is the warning obtained::
thanky@thanky-HP-15-Notebook-PC:~/mproj/db/sample$ sphinxtrain -t setup sample
Sphinxtrain path: /usr/local/lib/sphinxtrain
Sphinxtrain binaries path: /usr/local/libexec/sphinxtrain
Running the training
MODULE: 000 Computing feature from audio files
Extracting features from segments starting at (part 1 of 1)
Extracting features from segments starting at (part 1 of 1)
Feature extraction is done
MODULE: 00 verify training files
Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file.
Found 71 words using 46 phones
Phase 2: Checking to make sure there are not duplicate entries in the dictionary
Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
Phase 4: Checking number of lines in the transcript file should match lines in fileids file
Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: 0.0420722222222222
ERROR: Not enough data for the training, we can only train CI models (set CFG_CD_TRAIN to "no")
Phase 6: Checking that all the words in the transcript are in the dictionary
Words in dictionary: 68
Words in filler dictionary: 3
WARNING: Bad line in transcript: <sil> AVASARAMORUKKAN <sil> PRADHANAMANTHRIYODUM <sil> KENDRTHIRANJEDUPPU <sil> (a12)
WARNING: Utterance ID mismatch on line 13: speaker_1/a12 vs
WARNING: Bad line in transcript:
<s<sil> COMMISIONODUM <sil> AAVASHYAPEDUMENNU <sil> MANTHRI <sil> ARIYICHU <sil> (a13)
WARNING: Utterance ID mismatch on line 14: speaker_1/a13 vs
WARNING: Bad line in transcript: <sil> AVASARAMORUKKAN PRADHANAMANTHRIYODUM KENDRA THIRANJEDUPPU COMMISIONODUM AAVASHYAPEDUMENNU MANTHRI ARIYICHU <sil> (b9)
WARNING: Utterance ID mismatch on line 23: speaker_2/b9 vs
Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
WARNING: This phone (dhh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)
WARNING: This phone (hh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)
WARNING: This phone (oh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)
The phone "dhh, hh ,oh " etc are used in the .dic file. This phonemes wont be present in transcript files right? As, we write the normal words in the transcript files rather than writing the phonetic transcription. So how can I solve the above warning as it describes these particular phonemes are not present in the transcription files?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
WARNING: Bad line in transcript:
<sil> AVASARAMORUKKAN <sil> PRADHANAMANTHRIYODUM <sil> KENDRTHIRANJEDUPPU <sil> (a12)
WARNING: Utterance ID mismatch on line 13: speaker_1/a12 vs
You need to fix this error first
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you.
I have solved those issues. But the .html file created has strikes over all text. I have attached the screenshort of the .html file. Is this an indication to some error??
I was trying to build the acoustic model for the indian language, Malayalam. I was following the steps described in http://cmusphinx.sourceforge.net/wiki/tutorialam . The creation of acoustic model was successful while on trying with an4 database. But when I did the same steps in my own database, creation of required folders:
model_parameters
model_architecture
result
Was not successful. Does this have a connection with the above mentioned error??
I made the training data to 5 hr, and on training the following required folders were created.
model_parameters
model_architecture
result
But still the strikes of text in .html is not resolved and also in the end of training some errors popped up. Error:
Training for 8 Gaussian(s) completed after 7 iterations
MODULE: 60 Lattice Generation
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 61 Lattice Pruning
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 62 Lattice Format Conversion
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 65 MMIE Training
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 90 deleted interpolation
Skipped for continuous models
MODULE: DECODE Decoding using models previously trained
Decoding 14 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
Can't open /home/thanky/mproj/db/sample/result/sample-1-1.match
word_align.pl failed with error code 65280 at /usr/local/lib/sphinxtrain/scripts/decode/slave.pl line 173.
How to solve this?
Last edit: tmt 2016-02-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
PLease I have the same problem
(MODULE: DECODE Decoding using models previously trained
Decoding 0 segments starting at 0 (part 1 of 1)
Aligning results to find error rate
word_align.pl failed with error code 65280 at C:\ProjectSphinx\sphinxtrain\scripts\decode\slave.pl line 173.)
How you solve that
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
On running "Sphinxtrain -s decode run" ,
the result obtained was ::
MODULE: DECODE Decoding using models previously trained
Decoding 14 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
SENTENCE ERROR: 14.3% (2/14) WORD ERROR RATE: 14.0% (8/57)
I assume that the work till this step is correct.As a next step, I have to perform live recognition . What are the steps to be followed for live recognition?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you..Now,I am able to do live recognition.
Right now I was doing the recognition of Malayalam language by doing the transcription in english for every Malayalam words I used. Is it possible to use sphinx to work with unicode characters, ie, to write my text corpus,dictionary, transcription files everything in malayalam unicode format ? If it is possible please do help me.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is it possible to use sphinx to work with unicode characters, ie, to write my text corpus,dictionary, transcription files everything in malayalam unicode format ?
Yes, you can use utf-8 encoding.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I created the database for the Indian Language, Malayalam with all the necessary files as described in http://cmusphinx.sourceforge.net/wiki/tutorialam .
After creation of this database i ran the command "sphinxtrain -t setup sample"
following is the warning obtained::
thanky@thanky-HP-15-Notebook-PC:~/mproj/db/sample$ sphinxtrain -t setup sample
Sphinxtrain path: /usr/local/lib/sphinxtrain
Sphinxtrain binaries path: /usr/local/libexec/sphinxtrain
Running the training
MODULE: 000 Computing feature from audio files
Extracting features from segments starting at (part 1 of 1)
Extracting features from segments starting at (part 1 of 1)
Feature extraction is done
MODULE: 00 verify training files
Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file.
Found 71 words using 46 phones
Phase 2: Checking to make sure there are not duplicate entries in the dictionary
Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
Phase 4: Checking number of lines in the transcript file should match lines in fileids file
Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: 0.0420722222222222
ERROR: Not enough data for the training, we can only train CI models (set CFG_CD_TRAIN to "no")
Phase 6: Checking that all the words in the transcript are in the dictionary
Words in dictionary: 68
Words in filler dictionary: 3
WARNING: Bad line in transcript:
<sil> AVASARAMORUKKAN <sil> PRADHANAMANTHRIYODUM <sil> KENDRTHIRANJEDUPPU <sil>(a12)WARNING: Utterance ID mismatch on line 13: speaker_1/a12 vs
WARNING: Bad line in transcript:
<s<sil> COMMISIONODUM <sil> AAVASHYAPEDUMENNU <sil> MANTHRI <sil> ARIYICHU <sil> (a13)
WARNING: Utterance ID mismatch on line 14: speaker_1/a13 vs
WARNING: Bad line in transcript:
<sil> AVASARAMORUKKAN PRADHANAMANTHRIYODUM KENDRA THIRANJEDUPPU COMMISIONODUM AAVASHYAPEDUMENNU MANTHRI ARIYICHU <sil>(b9)WARNING: Utterance ID mismatch on line 23: speaker_2/b9 vs
Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
WARNING: This phone (dhh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)
WARNING: This phone (hh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)
WARNING: This phone (oh) occurs in the phonelist (/home/thanky/mproj/db/sample/etc/sample.phone), but not in any word in the transcription (/home/thanky/mproj/db/sample/etc/sample_train.transcription)
The phone "dhh, hh ,oh " etc are used in the .dic file. This phonemes wont be present in transcript files right? As, we write the normal words in the transcript files rather than writing the phonetic transcription. So how can I solve the above warning as it describes these particular phonemes are not present in the transcription files?
You need to fix this error first
Make sure that words with those phonemes are present in transcripts. Due to earlier errors such words might be excluded.
Thank you.
I have solved those issues. But the .html file created has strikes over all text. I have attached the screenshort of the .html file. Is this an indication to some error??
I was trying to build the acoustic model for the indian language, Malayalam. I was following the steps described in http://cmusphinx.sourceforge.net/wiki/tutorialam . The creation of acoustic model was successful while on trying with an4 database. But when I did the same steps in my own database, creation of required folders:
model_parameters
model_architecture
result
Was not successful. Does this have a connection with the above mentioned error??
You do not have enough data, add more data for training
I made the training data to 5 hr, and on training the following required folders were created.
model_parameters
model_architecture
result
But still the strikes of text in .html is not resolved and also in the end of training some errors popped up. Error:
Training for 8 Gaussian(s) completed after 7 iterations
MODULE: 60 Lattice Generation
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 61 Lattice Pruning
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 62 Lattice Format Conversion
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 65 MMIE Training
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 90 deleted interpolation
Skipped for continuous models
MODULE: DECODE Decoding using models previously trained
Decoding 14 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
Can't open /home/thanky/mproj/db/sample/result/sample-1-1.match
word_align.pl failed with error code 65280 at /usr/local/lib/sphinxtrain/scripts/decode/slave.pl line 173.
How to solve this?
Last edit: tmt 2016-02-16
You can find details in logdir/decode folder in a log file.
pocketsphinx_batch: error while loading shared libraries: libpocketsphinx.so.3: cannot open shared object file: No such file or directory
Last edit: tmt 2016-02-18
You need to configure linker with LD_LIBRARY_PATH or with /etc/ld.so.conf to load libraries from /usr/local
how to do that? I had checked this link http://stackoverflow.com/questions/4743233/is-usr-local-lib-searched-for-shared-libraries
But I was not able to follow. Please help.
Last edit: tmt 2016-02-18
Ask Google
PLease I have the same problem
(MODULE: DECODE Decoding using models previously trained
Decoding 0 segments starting at 0 (part 1 of 1)
Aligning results to find error rate
word_align.pl failed with error code 65280 at C:\ProjectSphinx\sphinxtrain\scripts\decode\slave.pl line 173.)
How you solve that
On running "Sphinxtrain -s decode run" ,
the result obtained was ::
MODULE: DECODE Decoding using models previously trained
Decoding 14 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
SENTENCE ERROR: 14.3% (2/14) WORD ERROR RATE: 14.0% (8/57)
I assume that the work till this step is correct.As a next step, I have to perform live recognition . What are the steps to be followed for live recognition?
This question is answered in the following section of our tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialam#using_the_model
Thank you..Now,I am able to do live recognition.
Right now I was doing the recognition of Malayalam language by doing the transcription in english for every Malayalam words I used. Is it possible to use sphinx to work with unicode characters, ie, to write my text corpus,dictionary, transcription files everything in malayalam unicode format ? If it is possible please do help me.
Yes, you can use utf-8 encoding.
Can you please suggest a tutorial to work with utf-8 encoding?
it is simple. I also use it. just replace english alphabets with utf-8. its that simple