CMU Sphinx / Forums / Help: recognition speed sphinxtrain 1.0.8 vs. 5prealpha

bekoe - 2015-06-23

Hi guys,

after upgrading from sphinx* 1.0.8 to the 5prealpha I noticed that the recognition (via sphinxtrain decode and also pocketsphinx_batch itself) is much slower. The config/ parameter stayed the same. Same goes for the latest subversion version. The log file shows nothing suspect, it just seems to be very slow, like 10 sentences in several minutes...

Did anyone experience this as well? Are weights and beam values interpreted in a different way? How to change the config and parameter when upgrading to the latest version?

AM build seems not the be affected by this.

Thanks for your help,

Benjamin

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-06-23
  
  5prealpha is expected to be faster and significantly more accurate.
  
  You are welcome to provide data to reproduce your problem, decoder configuration, files and exact times you see.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bekoe - 2015-06-24

Hi Nickolay,

thanks for your help. The recognition of the test set (45 sentences) took about 2,5h (5prealpha). Decoding the same files with the same AM, config and the older sphinx version takes like 2-3min.

Sharing the files for a build will take a while. Would the acoustic model be of any help? Here is the config file.

Also, I'm using a lmctl. But same goes for a setup with a plain language model

Thanks!

Last edit: bekoe 2015-06-24

sphinx_train.cfg

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-06-24
  
  I'm sorry, without test sentences I can't help you.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bekoe - 2015-06-24

Do you mean the test sentences or the audio files for training?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-06-24
  
  I meant "test sentences", the ones you are running on. I also need your acoustic model. I need to reproduce your problems.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bekoe - 2015-06-24

Alright, there you go:
https://www.dropbox.com/sh/b7rpjryneydb632/AAACMBRzMCzlL6Hy66YlQ6jKa?dl=0

Do you need anything else?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-06-25
  
  Sorry, there is no phonetic dictionary in the archive. I can't run the sample without the dictionary.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bekoe - 2015-06-25

Oh I forgot that. I've uploaded it just now.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-06-25
  
  I run
  
  pocketsphinx_batch -adcin yes -cepdir . -cepext .wav -lmctl Ld.lmctl -dict ld.dic.51k.txt -hmm ld.cd_cont_200 -ctl ld_test.fileids -lmname ld1 -hyp ld.hyp -wbeam 1e-40 -beam 1e-80
  
  My results for 5prealpha
  
  ~~~~~~~~~~~~~~
  INFO: batch.c(777): TOTAL 92.32 seconds speech, 11.45 seconds CPU, 11.46 seconds wall
  INFO: batch.c(779): AVERAGE 0.12 xRT (CPU), 0.12 xRT (elapsed)
  INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 7.98 CPU 0.086 xRT
  INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 7.99 wall 0.087 xRT
  INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 2.96 CPU 0.032 xRT
  INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 2.97 wall 0.032 xRT
  INFO: ngram_search.c(303): TOTAL bestpath 0.51 CPU 0.005 xRT
  INFO: ngram_search.c(306): TOTAL bestpath 0.51 wall 0.005 xRT
  
  My result for 0.8 ~~~~~~~~~~~~~~~~~ INFO: batch.c(774): TOTAL 107.00 seconds speech, 13.44 seconds CPU, 13.45 seconds wall INFO: batch.c(776): AVERAGE 0.13 xRT (CPU), 0.13 xRT (elapsed) INFO: ngram_search_fwdtree.c(430): TOTAL fwdtree 8.77 CPU 0.082 xRT INFO: ngram_search_fwdtree.c(433): TOTAL fwdtree 8.78 wall 0.082 xRT INFO: ngram_search_fwdflat.c(174): TOTAL fwdflat 3.64 CPU 0.034 xRT INFO: ngram_search_fwdflat.c(177): TOTAL fwdflat 3.65 wall 0.034 xRT INFO: ngram_search.c(317): TOTAL bestpath 1.02 CPU 0.010 xRT INFO: ngram_search.c(320): TOTAL bestpath 1.02 wall 0.010 xRT
  
  As expected, 5prealpha is faster
  
  You probably want to provide your decoding log if you was able to reproduce the original problem.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bekoe - 2015-06-26

Hi Nickolay,

thanks for your reply. Running the command you provided I only get er as result, for all files. Same results for pocketsphinx_continuous with the -infile option.
When extracting the audio files' features with make_feats.pl and running pocketsphinx_batch with -cepext with mfc instead of wav, the results are fine but it just takes to long to recognize. Like several minutes.

So is there something wrong with the feature extraction?

Last edit: bekoe 2015-06-26

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-06-26
  
  Yes, you need to add lines in feat.params:
  
  -transform dct
  -lifter 22
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bekoe - 2015-06-26

Worked like a charm! Thank you so much Nickolay

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

recognition speed sphinxtrain 1.0.8 vs. 5prealpha

Speech Recognition Toolkit

Forums

Help

recognition speed sphinxtrain 1.0.8 vs. 5prealpha document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

recognition speed sphinxtrain 1.0.8 vs. 5prealpha