CMU Sphinx / Forums / Help: Adapting vs Training

Zain - 2017-01-08

I want to build a pocketsphinx based speech recognition system for Arabic continuous speech. I have more than 10 hours corpus, I am not sure if it is better to use adapting technique or to train the system from scratch.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-08
  
  You need to train and you need much more than 10 hours of data. Ideally you need 200-300 hours.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-01-08

Thank you.
After started training, it gave me the following warnings,t the training was not complete.
WARNING: Utterance ID mismatch on line 6143: 202/202-96 vs
WARNING: Bad line in transcript:
▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒ ▒▒▒ ▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒ ▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒ ▒▒▒▒▒ ▒▒▒▒▒ ▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒
...
WARNING: This phone (Z) occurs in the phonelist (/home/.../trial1/etc/trial1.phone), but not in any word in the transcription (/home/.../trial1/etc/trial1_train.transcription)

Any help.

Last edit: Zain 2017-01-08

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-08
  
  You need to prepare the data in required format as described in tutorial. If format has errors, training will not proceed.
  
  If you need further help you can share your database.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-01-08

Just finished training. It also gives the results which are extremely low. However, there are some errors as shown below:
What is the reason of this error which frequently appeard during taining?
Is this resutl resonable?

ERROR: This step had 11206 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 6
Current Overall Likelihood Per Frame = -145.917409811125
Training for 8 Gaussian(s) completed after 6 iterations
MODULE: 60 Lattice Generation
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 61 Lattice Pruning
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 62 Lattice Format Conversion
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 65 MMIE Training
Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
MODULE: 90 deleted interpolation
Skipped for continuous models
MODULE: DECODE Decoding using models previously trained
Decoding 400 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
SENTENCE ERROR: 90.2% (361/400) WORD ERROR RATE: 52.4% (1877/3585)

Last edit: Zain 2017-01-08

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-08
  
  What is the reason of this error which frequently appeard during taining?
  
  Answered in troubleshooting section of tutorial
  
  Is this resutl resonable?
  
  Yes
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-01-09

I noticed that the system trained using Sphinx 3 gives better accuracy more than Pocketsphinx. Is there any reason for such difference?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-09
  
  Our models are all trained with sphinxtrain, you probably mean the model trained for sphinx4/sphinx3 (continuos) and pocketsphinx(semi-continuous and ptm). continuous models are expected to be more accurate, but they are also slower to decode. You can learn more on wiki:
  
  http://cmusphinx.sourceforge.net/wiki/acousticmodeltypes
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-01-10

I used the following command to check a speech file. Could you please let me know how to interpret the results? What does the below 92.9%?

$ for i in .wav; do play $i; done

arctic_0001.wav:

File Size: 176k Bit Rate: 256k
Encoding: Signed PCM
Channels: 1 @ 16-bit
Samplerate: 16000Hz
Replaygain: off
Duration: 00:00:05.51

In:92.9% 00:00:05.12 [00:00:00.39] Out:81.9k [ =====|===== ] Hd:5.6 Clip:0 Segmentation fault (core dumped)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-10
  
  Play command crashed due to the bug, maybe a bug in driver, maybe something else. It is not really related to pocketsphinx.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-01-11

How to know the best choice of CFG_N_TIED_STATES and CFG_FINAL_NUM_DENSITIES for a particular speech collection? Could you please send me a tutorial link for preparing language models for CMU Sphinx.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-11
  
  How to know the best choice of CFG_N_TIED_STATES and CFG_FINAL_NUM_DENSITIES for a particular speech collection?
  
  Covered in a table in http://cmusphinx.sourceforge.net/wiki/tutorialam#configure_model_type_and_model_parameters
  
  Could you please send me a tutorial link for preparing language models for CMU Sphinx.
  
  http://cmusphinx.sourceforge.net/wiki/tutoriallm
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-01-12

Is it ok for CMU sphinx training dictionary to have small letters or it can contain both small and capital letters for phoneme representation?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-12
  
  It is ok but not recommended.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-01-13

I have a problem in preparing the language model that gives the following error message:

hash_add: Error: [AistiqTaAbi] hash conflict
There are two entries in the dictionary for [AistiqTaAbi]
Please change or remove one of them and re-run.

However, this entry belongs to two different words as shown in the dictionary:
AistiqTAabi A i s t i q T A a b i
AistiqTaAbi A i s t i q T aA b i

It seems that this problem related to case sensitive, any help.

Last edit: Zain 2017-01-13

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-13
  
  It is not quite clear what software do you run.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-01-13

cmuclmtk and lm3g2dmp.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-14
  
  Neither of them requires dictionary. You need to be more precise in description of your problems. The more details you provide the faster you get an answer.
  
  http://catb.org/~esr/faqs/smart-questions.html
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-03-26

Is there any method to find the execution time of the training and decoding?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-03-30

I performed some experiment and found that ptm and semi-continuous have same WER while the continuous acoustic model has higher WER, is this reasonable?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-04-12

How to know the number of triphones used in my pocketsphinx system?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-04-12
  
  Open the mdef file in a text editor.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-04-21

Could you please let me know if $CFG_N_TIED_STATES means the triphones.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-04-21
  
  The idea of tied states is explained in our tutorial:
  
  http://cmusphinx.sourceforge.net/wiki/tutorialconcepts
  
  For computational purpose it is helpful to detect parts of triphones instead of triphones as a whole, for example, to create a detector for a beginning of triphone and share it across many triphones. The whole variety of sound detectors can be represented by a small amount of distinct short sound detectors. Usually we use 4000 distinct short sound detectors to compose detectors for triphones. We call those detectors senones. A senone's dependence on context could be more complex than just left and right context. It can be a rather complex function defined by a decision tree, or in some other way.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zain - 2017-04-25

Is it possible to run CMU PocketSphinx using the “any-word” language model such as:
$WORD = (X | Y | Z );
(SENT-START <$WORD> SENT-END)

That is, I want to evaluate the performance using this language model instead of the probabilistic N-Grams.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adapting vs Training

Speech Recognition Toolkit

Forums

Help

Adapting vs Training document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Adapting vs Training