Menu

recognition speed sphinxtrain 1.0.8 vs. 5prealpha

Help
bekoe
2015-06-23
2015-06-26
  • bekoe

    bekoe - 2015-06-23

    Hi guys,

    after upgrading from sphinx* 1.0.8 to the 5prealpha I noticed that the recognition (via sphinxtrain decode and also pocketsphinx_batch itself) is much slower. The config/ parameter stayed the same. Same goes for the latest subversion version. The log file shows nothing suspect, it just seems to be very slow, like 10 sentences in several minutes...

    Did anyone experience this as well? Are weights and beam values interpreted in a different way? How to change the config and parameter when upgrading to the latest version?

    AM build seems not the be affected by this.

    Thanks for your help,

    Benjamin

     
    • Nickolay V. Shmyrev

      5prealpha is expected to be faster and significantly more accurate.

      You are welcome to provide data to reproduce your problem, decoder configuration, files and exact times you see.

       
  • bekoe

    bekoe - 2015-06-24

    Hi Nickolay,

    thanks for your help. The recognition of the test set (45 sentences) took about 2,5h (5prealpha). Decoding the same files with the same AM, config and the older sphinx version takes like 2-3min.

    Sharing the files for a build will take a while. Would the acoustic model be of any help? Here is the config file.

    Also, I'm using a lmctl. But same goes for a setup with a plain language model

    Thanks!

     

    Last edit: bekoe 2015-06-24
    • Nickolay V. Shmyrev

      I'm sorry, without test sentences I can't help you.

       
  • bekoe

    bekoe - 2015-06-24

    Do you mean the test sentences or the audio files for training?

     
    • Nickolay V. Shmyrev

      I meant "test sentences", the ones you are running on. I also need your acoustic model. I need to reproduce your problems.

       
  • bekoe

    bekoe - 2015-06-24

    Alright, there you go:
    https://www.dropbox.com/sh/b7rpjryneydb632/AAACMBRzMCzlL6Hy66YlQ6jKa?dl=0

    Do you need anything else?

     
    • Nickolay V. Shmyrev

      Sorry, there is no phonetic dictionary in the archive. I can't run the sample without the dictionary.

       
  • bekoe

    bekoe - 2015-06-25

    Oh I forgot that. I've uploaded it just now.

     
    • Nickolay V. Shmyrev

      I run

            pocketsphinx_batch -adcin yes -cepdir . -cepext .wav -lmctl Ld.lmctl -dict ld.dic.51k.txt -hmm ld.cd_cont_200 -ctl ld_test.fileids -lmname ld1 -hyp ld.hyp -wbeam 1e-40 -beam 1e-80
      

      My results for 5prealpha

      ~~~~~~~~~~~~~~
      INFO: batch.c(777): TOTAL 92.32 seconds speech, 11.45 seconds CPU, 11.46 seconds wall
      INFO: batch.c(779): AVERAGE 0.12 xRT (CPU), 0.12 xRT (elapsed)
      INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 7.98 CPU 0.086 xRT
      INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 7.99 wall 0.087 xRT
      INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 2.96 CPU 0.032 xRT
      INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 2.97 wall 0.032 xRT
      INFO: ngram_search.c(303): TOTAL bestpath 0.51 CPU 0.005 xRT
      INFO: ngram_search.c(306): TOTAL bestpath 0.51 wall 0.005 xRT

      My result for 0.8
      
      ~~~~~~~~~~~~~~~~~
      INFO: batch.c(774): TOTAL 107.00 seconds speech, 13.44 seconds CPU, 13.45 seconds wall
      INFO: batch.c(776): AVERAGE 0.13 xRT (CPU), 0.13 xRT (elapsed)
      INFO: ngram_search_fwdtree.c(430): TOTAL fwdtree 8.77 CPU 0.082 xRT
      INFO: ngram_search_fwdtree.c(433): TOTAL fwdtree 8.78 wall 0.082 xRT
      INFO: ngram_search_fwdflat.c(174): TOTAL fwdflat 3.64 CPU 0.034 xRT
      INFO: ngram_search_fwdflat.c(177): TOTAL fwdflat 3.65 wall 0.034 xRT
      INFO: ngram_search.c(317): TOTAL bestpath 1.02 CPU 0.010 xRT
      INFO: ngram_search.c(320): TOTAL bestpath 1.02 wall 0.010 xRT
      

      As expected, 5prealpha is faster

      You probably want to provide your decoding log if you was able to reproduce the original problem.

       
  • bekoe

    bekoe - 2015-06-26

    Hi Nickolay,

    thanks for your reply. Running the command you provided I only get er as result, for all files. Same results for pocketsphinx_continuous with the -infile option.
    When extracting the audio files' features with make_feats.pl and running pocketsphinx_batch with -cepext with mfc instead of wav, the results are fine but it just takes to long to recognize. Like several minutes.

    So is there something wrong with the feature extraction?

     

    Last edit: bekoe 2015-06-26
    • Nickolay V. Shmyrev

      Yes, you need to add lines in feat.params:

      -transform dct
      -lifter 22

       
  • bekoe

    bekoe - 2015-06-26

    Worked like a charm! Thank you so much Nickolay

     

Log in to post a comment.