CMU Sphinx / Forums / Help: How many different words/tri-phones required for training/adaptation

Oren G. - 2016-11-13

Hello,
The tutorial regarding training a new acoustic model for dictation does not say how many different words or tri-phones the audio files should contain. For example, two databases may have the same number of hours and speakers, but the first one has 1000 different words, and the second has 10,000 different words.

I have a database of 130 hours, 1100 speakers. How many different words/tri-phones do I need?
Is there a formula which I can use in order to calculate it for other databases?
Does more different words will give better accuracy?
How tri-phones compare to words in this respect?

The same question regarding adapting the default English model for 5 minutes.

Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Oren G. - 2016-11-13
  
  In case I'm not clear about "different words":
  
  A database may contain 5 wav files. In each of them you hear the sentence "it is here". This database has 3 different words.
  
  Another database contains 2 wav files. At the first one you hear "it is here". At the second one you hear "How are you". This database has 6 different words.
  
  Last edit: Oren G. 2016-11-13
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2016-11-13
    
    Didn't you see the table in tutorial?
    
    http://cmusphinx.sourceforge.net/wiki/tutorialam#configure_model_type_and_model_parameters
    
    Vocabulary Hours in db Senones Densities Example
    
    20 5 200 8 Tidigits Digits Recognition
    100 20 2000 8 RM1 Command and Control
    5000 30 4000 16 WSJ1 5k Small Dictation
    20000 80 4000 32 WSJ1 20k Big Dictation
    60000 200 6000 16 HUB4 Broadcast News
    60000 2000 12000 64 Fisher Rich Telephone Transcription
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Oren G. - 2016-11-13
      
      OK. Since my corpus has 130 hours and only about 5000 words, should I use only part of the corpus? (i.e, the table say 5000 words=30 hours). And I want to use PTM, not continuous.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Nickolay V. Shmyrev - 2016-11-14
        
        Yes
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How many different words/tri-phones required for training/adaptation

Speech Recognition Toolkit

Forums

Help

How many different words/tri-phones required for training/adaptation document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

How many different words/tri-phones required for training/adaptation