CMU Sphinx / Forums / Help: doing correct?

rastinrastini - 2016-10-18

Hi
pls tell me that i training correctly or not.
1- splitted wav files.
2- text file with tags.
3- text2wfreq < hafez.txt | wfreq2vocab > hafez.tmp.vocab
4- text2idngram -vocab hafez.tmp.vocab -idngram hafez.idngram < hafez.txt
5- idngram2lm -vocab_type 0 -idngram hafez.idngram -vocab hafez.tmp.vocab -arpa hafez.lm
6- ngram-count -kndiscount -interpolate -text data.txt -lm data.lm
7- sphinx -t hafez setup
8- create dic file
9- create fileids file
10- add numbers to txt file and rename to hafez_train.transcription
11- add this to feat.param file: "-lowerf 130 -upperf 6800 -nfilt 25 -transform dct -lifter 22 -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -agc none -cmn current -varnorm no"
12- hafez.filler with this content: " ~~SIL~~ SIL <sil> SIL"
13- hafez.phone with this content:"AH AX DH IX SIL آ ا ب ت ث پ ج چ ح خ د ذ ر ز ژ س ش ص ض ظ ع "
14- matrix and mma in cfg to true.
15- sphinxtrain run.</sil>

now only have these files: "hafez.align hafez.match hafez.match12600" and dont have matrix file.
test dont work.
does it correct?
where is my problem?
any one can help?
Thankful.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-10-18
  
  3- text2wfreq < hafez.txt | wfreq2vocab > hafez.tmp.vocab
  4- text2idngram -vocab hafez.tmp.vocab -idngram hafez.idngram < hafez.txt
  5- idngram2lm -vocab_type 0 -idngram hafez.idngram -vocab hafez.tmp.vocab -arpa hafez.lm
  6- ngram-count -kndiscount -interpolate -text data.txt -lm data.lm
  
  You can use either cmuclmtk or srilm, not both
  
  11- add this to feat.param file: "-lowerf 130 -upperf 6800 -nfilt 25 -transform dct -lifter 22 -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -agc none -cmn current -varnorm no"
  
  Not a good idea, it was not in tutorial
  
  13- hafez.phone with this content:"AH AX DH IX SIL آ ا ب ت ث پ ج چ ح خ د ذ ر ز ژ س ش ص ض ظ ع "
  
  Phones better be english letters
  
  test dont work.
  does it correct?
  
  No, if tests do not work, you made some mistake somewhere
  
  where is my problem?
  
  You didn't provide enough data to get an answer on this question. You need to provide an acoustic model training folder.
  
  any one can help?
  
  Sure as soon as you provide the required information.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rastinrastini - 2016-10-18

thx for helping.
then what must write in feat.param?
give you an screen shot from my training folder?

now only use cmuclmtk.
dont change feat.params file.

in phase 5: This is a small amount of data, no comment at this time
phase 7 is failed.

attached training folder without feat and wav folder for smalling size.
where is my problem?
anymore information need?

Last edit: rastinrastini 2016-10-18

hafez.tar.gz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rastinrastini - 2016-10-18

MODULE: 000 Computing feature from audio files (2016-10-18 19:28)

Extracting features from segments starting at (part 1 of 1)

sphinx_fe Log File
completed

Extracting features from segments starting at (part 1 of 1)

sphinx_fe Log File
completed

Feature extraction is done
MODULE: 00 verify training files (2016-10-18 19:28)

Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file.

Found 2331 words using 36 phones

WARNING: This phone (â€Œ) occurs in the dictionary (/media/rastinrastini/3090AD6429530FCB/projects/speech/sphinx/hafez/etc/hafez.dic), but not in the phonelist (/media/rastinrastini/3090AD6429530FCB/projects/speech/sphinx/hafez/etc/hafez.phone)
passed

Phase 2: Checking to make sure there are not duplicate entries in the dictionary
passed

Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
passed

Phase 4: Checking number of lines in the transcript file should match lines in fileids file
passed

Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.

Estimated Total Hours Training: 1.07844722222222

This is a small amount of data, no comment at this time
WARNING

Phase 6: Checking that all the words in the transcript are in the dictionary

Words in dictionary: 2328

Words in filler dictionary: 3
passed

Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
passed

Last edit: rastinrastini 2016-10-18

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rastinrastini - 2016-10-18

Thanks my problem solved.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

doing correct?

Speech Recognition Toolkit

Forums

Help

doing correct? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

doing correct?