Menu

Adapting vs Training

Help
Zain
2017-01-08
2017-05-21
1 2 > >> (Page 1 of 2)
  • Zain

    Zain - 2017-01-08

    I want to build a pocketsphinx based speech recognition system for Arabic continuous speech. I have more than 10 hours corpus, I am not sure if it is better to use adapting technique or to train the system from scratch.

     
    • Nickolay V. Shmyrev

      You need to train and you need much more than 10 hours of data. Ideally you need 200-300 hours.

       
  • Zain

    Zain - 2017-01-08

    Thank you.
    After started training, it gave me the following warnings,t the training was not complete.
    WARNING: Utterance ID mismatch on line 6143: 202/202-96 vs
    WARNING: Bad line in transcript:
    ▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒ ▒▒▒ ▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒ ▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒ ▒▒▒▒▒ ▒▒▒▒▒ ▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒
    ...
    WARNING: This phone (Z) occurs in the phonelist (/home/.../trial1/etc/trial1.phone), but not in any word in the transcription (/home/.../trial1/etc/trial1_train.transcription)

    Any help.

     

    Last edit: Zain 2017-01-08
    • Nickolay V. Shmyrev

      You need to prepare the data in required format as described in tutorial. If format has errors, training will not proceed.

      If you need further help you can share your database.

       
  • Zain

    Zain - 2017-01-08

    Just finished training. It also gives the results which are extremely low. However, there are some errors as shown below:
    What is the reason of this error which frequently appeard during taining?
    Is this resutl resonable?


    ERROR: This step had 11206 ERROR messages and 0 WARNING messages. Please check the log file for details.
    Normalization for iteration: 6
    Current Overall Likelihood Per Frame = -145.917409811125
    Training for 8 Gaussian(s) completed after 6 iterations
    MODULE: 60 Lattice Generation
    Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 61 Lattice Pruning
    Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 62 Lattice Format Conversion
    Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 65 MMIE Training
    Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 90 deleted interpolation
    Skipped for continuous models
    MODULE: DECODE Decoding using models previously trained
    Decoding 400 segments starting at 0 (part 1 of 1)
    0%
    Aligning results to find error rate
    SENTENCE ERROR: 90.2% (361/400) WORD ERROR RATE: 52.4% (1877/3585)


     

    Last edit: Zain 2017-01-08
    • Nickolay V. Shmyrev

      What is the reason of this error which frequently appeard during taining?

      Answered in troubleshooting section of tutorial

      Is this resutl resonable?

      Yes

       
  • Zain

    Zain - 2017-01-09

    I noticed that the system trained using Sphinx 3 gives better accuracy more than Pocketsphinx. Is there any reason for such difference?

     
    • Nickolay V. Shmyrev

      Our models are all trained with sphinxtrain, you probably mean the model trained for sphinx4/sphinx3 (continuos) and pocketsphinx(semi-continuous and ptm). continuous models are expected to be more accurate, but they are also slower to decode. You can learn more on wiki:

      http://cmusphinx.sourceforge.net/wiki/acousticmodeltypes

       
  • Zain

    Zain - 2017-01-10

    I used the following command to check a speech file. Could you please let me know how to interpret the results? What does the below 92.9%?

    $ for i in .wav; do play $i; done

    arctic_0001.wav:

    File Size: 176k Bit Rate: 256k
    Encoding: Signed PCM
    Channels: 1 @ 16-bit
    Samplerate: 16000Hz
    Replaygain: off
    Duration: 00:00:05.51

    In:92.9% 00:00:05.12 [00:00:00.39] Out:81.9k [ =====|===== ] Hd:5.6 Clip:0 Segmentation fault (core dumped)

     
    • Nickolay V. Shmyrev

      Play command crashed due to the bug, maybe a bug in driver, maybe something else. It is not really related to pocketsphinx.

       
  • Zain

    Zain - 2017-01-11

    How to know the best choice of CFG_N_TIED_STATES and CFG_FINAL_NUM_DENSITIES for a particular speech collection? Could you please send me a tutorial link for preparing language models for CMU Sphinx.

     
  • Zain

    Zain - 2017-01-12

    Is it ok for CMU sphinx training dictionary to have small letters or it can contain both small and capital letters for phoneme representation?

     
    • Nickolay V. Shmyrev

      It is ok but not recommended.

       
  • Zain

    Zain - 2017-01-13

    I have a problem in preparing the language model that gives the following error message:

    hash_add: Error: [AistiqTaAbi] hash conflict
    There are two entries in the dictionary for [AistiqTaAbi]
    Please change or remove one of them and re-run.

    However, this entry belongs to two different words as shown in the dictionary:
    AistiqTAabi A i s t i q T A a b i
    AistiqTaAbi A i s t i q T aA b i

    It seems that this problem related to case sensitive, any help.

     

    Last edit: Zain 2017-01-13
    • Nickolay V. Shmyrev

      It is not quite clear what software do you run.

       
  • Zain

    Zain - 2017-01-13

    cmuclmtk and lm3g2dmp.

     
    • Nickolay V. Shmyrev

      Neither of them requires dictionary. You need to be more precise in description of your problems. The more details you provide the faster you get an answer.

      http://catb.org/~esr/faqs/smart-questions.html

       
  • Zain

    Zain - 2017-03-26

    Is there any method to find the execution time of the training and decoding?

     
  • Zain

    Zain - 2017-03-30

    I performed some experiment and found that ptm and semi-continuous have same WER while the continuous acoustic model has higher WER, is this reasonable?

     
  • Zain

    Zain - 2017-04-12

    How to know the number of triphones used in my pocketsphinx system?

     
    • Nickolay V. Shmyrev

      Open the mdef file in a text editor.

       
  • Zain

    Zain - 2017-04-21

    Could you please let me know if $CFG_N_TIED_STATES means the triphones.

     
    • Nickolay V. Shmyrev

      The idea of tied states is explained in our tutorial:

      http://cmusphinx.sourceforge.net/wiki/tutorialconcepts

      For computational purpose it is helpful to detect parts of triphones instead of triphones as a whole, for example, to create a detector for a beginning of triphone and share it across many triphones. The whole variety of sound detectors can be represented by a small amount of distinct short sound detectors. Usually we use 4000 distinct short sound detectors to compose detectors for triphones. We call those detectors senones. A senone's dependence on context could be more complex than just left and right context. It can be a rather complex function defined by a decision tree, or in some other way.

       
  • Zain

    Zain - 2017-04-25

    Is it possible to run CMU PocketSphinx using the “any-word” language model such as:
    $WORD = (X | Y | Z );
    (SENT-START <$WORD> SENT-END)

    That is, I want to evaluate the performance using this language model instead of the probabilistic N-Grams.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.