CMU Sphinx / Forums / Help: How to bulk test phoneme recognition with pocketsphinx

Toine db - 2016-01-27

The phoneme recognition test with pocketsphinx_continuous work fine, but one at a time.
http://cmusphinx.sourceforge.net/wiki/phonemerecognition

Any suggestions how to approuch a bulk test for this?
(read a while directory or file with raw file locations?)

PS: I'm using Windows as a OS

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2016-01-27

Use pocketsphinx_batch with ctl file -ctl file.ctl as in the end of tutorial:

http://cmusphinx.sourceforge.net/wiki/tutorialadapt#testing_the_adaptation

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2016-02-05
  
  Thanks,
  
  But the batch gives different results then pocketsphinx_continues.. why could that be?
  
  I'm using the following commands
  
  pocketsphinx_continuous
  -hmm model/en-us/en-us
  -allphone model/en-us/en-us-phone.lm.bin
  -beam 1e-20
  -pbeam 1e-20
  -lw 2.0
  -infile testwavs/file1.wav
  -allphone_ci yes
  
  Result: SIL AY AE
  
  Batch Command
  pocketsphinx_batch
  -adcin yes
  -hmm model/en-us/en-us
  -allphone model/en-us/en-us-phone.lm.bin
  -beam 1e-20
  -pbeam 1e-20
  -lw 2.0
  -cepdir testwavs
  -cepext .wav
  -ctl test/testfileids.txt
  -allphone_ci yes
  
  Result: SIL NG HH
  
  FYI: without -acdin I get errors
  INFO: batch.c(729): Decoding 'file1'
  ERROR: "batch.c", line 207: File length mismatch: 0x52494646 != 0xd7a, maybe it's not MFCC file
  ERROR: "batch.c", line 422: Failed to read MFCC from the file 'testwavs/file1.wav'
  
  Last edit: Toine db 2016-02-05
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Toine db - 2016-02-09
  
  @Nickolay any thoughts towards my problem?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2016-02-10
    
    Hi Toine
    
    I'm sorry, batch and continuos are processing audio differently. Batch considers audio as a whole and normalizes volume (CMN) audio as a whole. Continuos processes audio frame by frame and it normalizes only using frames it already seen. This is not optimal process as was discussed many times on the forum and unfortunately it needs a proper initial CMN estimation. You can set -cmninit parameter to the values you see in batch log output and you will get similar results between batch and continuous. Thats why we recommend to decode longer files with continuous.
    
    We will work on solution to make continuos more accurate from the start, it's just not there yet.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Toine db - 2016-02-11
      
      Thanks for the reply Nickolay,
      
      As I understand correctly the batch is closest thing to pocketsphinx output on phones?
      
      In my phone Apps I currently have a custom self build recognition system that detects a specific type of sound.
      When that sound is detected it sends it to pocketsphinx for recognition, therefore starts and stops the pocketsphinx again and again.
      
      Is pocketsphinx able to normalize volume in this phone scenario?
      (or are the processed frames cleared after I stop the recognition in between)
      
      Hope to hear from you
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Nickolay V. Shmyrev - 2016-02-11
        
        As I understand correctly the batch is closest thing to pocketsphinx output on phones?
        
        No, ps_process_raw calls do continuous processing.
        
        When that sound is detected it sends it to pocketsphinx for recognition, therefore starts and stops the pocketsphinx again and again.
        
        It is ok to stop and restart, just do not reinit the decoder.
        
        Is pocketsphinx able to normalize volume in this phone scenario?
        (or are the processed frames cleared after I stop the recognition in between)
        
        Volume is reset when you reinit the decoder or when you call ps_start_stream.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Toine db - 2016-02-12
        
        Thanks for the explaining anwser.
        
        Now i know in a functional way how to work with the decoder, but Im not really sure how to implement it.
        
        But I do not understand what your trying to say with the following comment:
        
        No, ps_process_raw calls do continuous processing
        Do I need to use Batch or Continues to get representative like phone results ???
        .
        
        Currently Im using ps_start_utt and ps_end_utt in between, but I'm not sure if this resets reinit the decoder and volume... the part that I really want to....
        .
        
        Can you give me a hint what method(s) I need to use to pause/restart pocketsphinx without triggering a reset???
        .
        
        Because I'm thinking of creating the following system
        1. In the background I want to keep PocketSphinx running/listening/decoding incomming audio to keep the Volume/CMN at a good level.
        2. When my custom system detects my sound I want to
        ...2.1 Pause background pocketsphinx
        ...2.2 Pause the background running pocketsphinx
        ...2.3 Clear any current recognition (without clear\reinit volume)
        ...2.4 Request recognition for my detected sound
        ...2.5 Continue background pocketsphinx
        
        Last question:
        Does -cmninit work like a kickstart, to set initial value but will be leveled by PocketSphinx during decoding??? or is it settings PocketSphinx a constant level???
        
        PS: I think I created the WIndows Phone example with reinit the decoder again and again, I'll check this later and possibly send a Push request with adjustments.
        
        Hope to hear from you, sorry for the lot of quesations about this topic but I want to get it best as possible. Thanks again for the last reply.
        
        Last edit: Toine db 2016-02-12
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Toine db - 2016-02-23
        
        Nickolay,
        
        Sorry to bother you, but could you give a short anwser to the 3 questions so I can improve the Windows Phone example?
        (3 questions are at bottom of this thread)
        
        I'm planning to adjust to parts in the
        + Pause (Pocketsphinx in kind of idle mode) withhout resetting whole decoder, what currently is the issue probably
        + Add nbest as return value
        
        Hope to hear from you,
        
        Toine
        
        Last edit: Toine db 2016-02-23
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2016-02-23
        
        What do you mean with this: "No, ps_process_raw calls do continuous processing
        Do I need to use Batch or Continues to get representative like phone results"
        
        Both types of processing give you representative results. You use batch for batch testing, continuous for decoding on the phone.
        
        What command do I need to use to pause and restart the decoder, without loosing the calibrated CMN?
        
        ps_end_utt stops the decoder. ps_start_utt starts the search. cmn is kept
        
        Is -cmninit a kickstart or set to be a constant?
        
        Initial value is used only for the first utterance, for next utterance cmn is recalculated.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Toine db - 2016-02-24
        
        Thanks for the answers, Ill adjust the Windows Phone example accordingly.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Toine db - 2016-02-17

@Nickolay, probably the text was a litle to long, I hope you could still answer the questions I still have (mainly about the mechanisme);

in short.
1. What do you mean with this: "No, ps_process_raw calls do continuous processing
Do I need to use Batch or Continues to get representative like phone results"
2. What command do I need to use to pause and restart the decoder, without loosing the calibrated CMN?
3. Is -cmninit a kickstart or set to be a constant?

Hope to hear from you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How to bulk test phoneme recognition with pocketsphinx_continuous

Speech Recognition Toolkit

Forums

Help

How to bulk test phoneme recognition with pocketsphinx_continuous

How to bulk test phoneme recognition with pocketsphinx_continuous

Speech Recognition Toolkit

Forums

Help

How to bulk test phoneme recognition with pocketsphinx_continuous document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

How to bulk test phoneme recognition with pocketsphinx_continuous