CMU Sphinx / Forums / Help: Different estimate total hours training using same parameters across different computers

Speech Recognition Toolkit

Different estimate total hours training using same parameters across different computers

Forum: Help

Creator: safia hammad

Created: 2015-05-27

Updated: 2015-05-28

safia hammad - 2015-05-27

Hello,

I am using Pocketsphinx to develop my ASR engine. I use the same wave files and sampling rate (16khz, mono) across different computers to run various experiments. However, I got the following in Phase 5:

"" Phase 5: Determine amount of training data, see if n_tied_states seems reasonable. Estimated Total Hours Training: 3.61094722222222 This is a small amount of data, no comment at this time""

However, using the same setup with the same wave files but on other computers, I got the following in Phase 5:

""Phase 5: Determine amount of training data, see if n_tied_states seems reasonable. Estimated Total Hours Training: 4.07851944444444 This is a small amount of data, no comment at this time""

I believe the correct estimated total hours training is 4.07851944444444 as stated in the second situation, because this number appeared in many other computers.
What could be the problem? same wave files, same parameters used for feature extraction, same sampling rate, ...etc.

Thank you

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- bic-user - 2015-05-28
  
  I suspect you're using different versions of sphinxbase on different machines. up-to-date sphinxbase uses voice activity detector during feature extraction which filters silence hence there are less audio for training
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - safia hammad - 2015-05-28
    
    Thank you very much for your reply. I actually used the same packages and tools (sphinxbase-0.8.tar; pocketsphinx-0.8.tar; sphinxtrain-1.0.8.tar) across all machines. Therefore, I don't think this is the reason
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Different estimate total hours training using same parameters across...

Speech Recognition Toolkit

Forums

Help

Different estimate total hours training using same parameters across different computers document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Different estimate total hours training using same parameters across different computers