Menu

doing correct?

Help
2016-10-18
2016-10-18
  • rastinrastini

    rastinrastini - 2016-10-18

    Hi
    pls tell me that i training correctly or not.
    1- splitted wav files.
    2- text file with tags.
    3- text2wfreq < hafez.txt | wfreq2vocab > hafez.tmp.vocab
    4- text2idngram -vocab hafez.tmp.vocab -idngram hafez.idngram < hafez.txt
    5- idngram2lm -vocab_type 0 -idngram hafez.idngram -vocab hafez.tmp.vocab -arpa hafez.lm
    6- ngram-count -kndiscount -interpolate -text data.txt -lm data.lm
    7- sphinx -t hafez setup
    8- create dic file
    9- create fileids file
    10- add numbers to txt file and rename to hafez_train.transcription
    11- add this to feat.param file: "-lowerf 130 -upperf 6800 -nfilt 25 -transform dct -lifter 22 -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -agc none -cmn current -varnorm no"
    12- hafez.filler with this content: " SIL SIL <sil> SIL"
    13- hafez.phone with this content:"AH AX DH IX SIL آ ا ب ت ث پ ج چ ح خ د ذ ر ز ژ س ش ص ض ظ ع "
    14- matrix and mma in cfg to true.
    15- sphinxtrain run.</sil>

    now only have these files: "hafez.align hafez.match hafez.match12600" and dont have matrix file.
    test dont work.
    does it correct?
    where is my problem?
    any one can help?
    Thankful.

     
    • Nickolay V. Shmyrev

      3- text2wfreq < hafez.txt | wfreq2vocab > hafez.tmp.vocab
      4- text2idngram -vocab hafez.tmp.vocab -idngram hafez.idngram < hafez.txt
      5- idngram2lm -vocab_type 0 -idngram hafez.idngram -vocab hafez.tmp.vocab -arpa hafez.lm
      6- ngram-count -kndiscount -interpolate -text data.txt -lm data.lm

      You can use either cmuclmtk or srilm, not both

      11- add this to feat.param file: "-lowerf 130 -upperf 6800 -nfilt 25 -transform dct -lifter 22 -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -agc none -cmn current -varnorm no"

      Not a good idea, it was not in tutorial

      13- hafez.phone with this content:"AH AX DH IX SIL آ ا ب ت ث پ ج چ ح خ د ذ ر ز ژ س ش ص ض ظ ع "

      Phones better be english letters

      test dont work.
      does it correct?

      No, if tests do not work, you made some mistake somewhere

      where is my problem?

      You didn't provide enough data to get an answer on this question. You need to provide an acoustic model training folder.

      any one can help?

      Sure as soon as you provide the required information.

       
  • rastinrastini

    rastinrastini - 2016-10-18

    thx for helping.
    then what must write in feat.param?
    give you an screen shot from my training folder?

    now only use cmuclmtk.
    dont change feat.params file.

    in phase 5: This is a small amount of data, no comment at this time
    phase 7 is failed.

    attached training folder without feat and wav folder for smalling size.
    where is my problem?
    anymore information need?

     

    Last edit: rastinrastini 2016-10-18
  • rastinrastini

    rastinrastini - 2016-10-18

    MODULE: 000 Computing feature from audio files (2016-10-18 19:28)

    Extracting features from segments starting at (part 1 of 1)

    sphinx_fe Log File
    completed

    Extracting features from segments starting at (part 1 of 1)

    sphinx_fe Log File
    completed

    Feature extraction is done
    MODULE: 00 verify training files (2016-10-18 19:28)

    Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file.

    Found 2331 words using 36 phones

    WARNING: This phone (‌) occurs in the dictionary (/media/rastinrastini/3090AD6429530FCB/projects/speech/sphinx/hafez/etc/hafez.dic), but not in the phonelist (/media/rastinrastini/3090AD6429530FCB/projects/speech/sphinx/hafez/etc/hafez.phone)
    passed

    Phase 2: Checking to make sure there are not duplicate entries in the dictionary
    passed

    Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
    passed

    Phase 4: Checking number of lines in the transcript file should match lines in fileids file
    passed

    Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.

    Estimated Total Hours Training: 1.07844722222222

    This is a small amount of data, no comment at this time
    WARNING

    Phase 6: Checking that all the words in the transcript are in the dictionary

    Words in dictionary: 2328

    Words in filler dictionary: 3
    passed

    Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
    passed

     

    Last edit: rastinrastini 2016-10-18
  • rastinrastini

    rastinrastini - 2016-10-18

    Thanks my problem solved.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.