CMU Sphinx / Forums / Help: PocketSphinx Diff in recognition of keyword between command line and python

Hi,
I am using the sample code in Python to recognize recorded message. I get no recognition.

However if I try the same phrase from the command line, I get the following output:

!pocketsphinx_continuous -infile file1.wav -hmm /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en-us -kws_threshold 1e-20 -keyphrase "tea we" -time yes -logfn /dev/null

tea we tea we
tea we 4.840 5.030 0.826032
tea we 4.510 4.680 0.835336
tea we
tea we 19.500 19.690 0.837511
tea we
tea we 39.550 39.700 0.822898

Example code for keyword recogntion:

#not matching command line output
import sys, os
from pocketsphinx import *
from sphinxbase import *


modeldir = get_model_path()
datadir = "../../../test/data"
print(modeldir)
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'en-us'))
#config.set_string('-dict', os.path.join(modeldir, 'cmudict-en-us.dict'))
config.set_string('-keyphrase', 'tea we')
config.set_float('-kws_threshold', 1e+20)


# Open file to read the data
stream = open("file1.wav", "rb")

# Alternatively you can read from microphone
# import pyaudio
# 
# p = pyaudio.PyAudio()
# stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
# stream.start_stream()

# Process audio chunk by chunk. On keyphrase detected perform action and restart search
decoder = Decoder(config)
decoder.start_utt()
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
    else:
         break
    if decoder.hyp() != None:
        print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
        print ("Detected keyphrase, restarting search")
        decoder.end_utt()
        decoder.start_utt()

What am I setting up wrong? The Threshld is ie20 in both cases

ALso if you could point me to documentation to understand the output firmat from command line, I would be most grateful!

In command line you have threshold 1e-20, in python code 1e+20, those are different.

OOps! I changed it but there is still no recognition. WOuld u also help me understand the output in the command line? I assume it is the timestamp on the audio while between whiich the keyword is found and then the confidence level?

#not matching command line output
import sys, os
from pocketsphinx import *
from sphinxbase import *


modeldir = get_model_path()
datadir = "../../../test/data"
print(modeldir)
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'en-us'))
#config.set_string('-dict', os.path.join(modeldir, 'cmudict-en-us.dict'))
config.set_string('-keyphrase', 'tea we')
config.set_float('-kws_threshold', 1e-20)


# Open file to read the data
stream = open("file1.wav", "rb")

# Alternatively you can read from microphone
# import pyaudio
# 
# p = pyaudio.PyAudio()
# stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
# stream.start_stream()

# Process audio chunk by chunk. On keyphrase detected perform action and restart search
decoder = Decoder(config)
decoder.start_utt()
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
    else:
         break
    if decoder.hyp() != None:
        print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
        print ("Detected keyphrase, restarting search")
        decoder.end_utt()
        decoder.start_utt()

Nickolay V. Shmyrev - 2019-10-24

I changed it but there is still no recognition.

You can share the file to get help on this

WOuld u also help me understand the output in the command line? I assume it is the timestamp on the audio while between whiich the keyword is found and then the confidence level?

Yes

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

kp ks - 2019-10-24

The file for which the command line works is attached .
I am trying to recognise the phrase "TVS Way 20000"

file1.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-10-24
  
  The file is 8khz, it is not going to work with 16khz model. Also you need en-in model, not en-us. Third, for such task it is better to use large vocabulary transcription and analyze the results.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - kp ks - 2019-10-24
    
    Thanks for your insight.
    I had actually used ffmpeg to specifically convert 8k into 16k files for this task. I have to dig nto why that is not working.
    eg SYntax used for conversion:
    
    ffmpeg -i testWav1.wav -ac 1 -ar 16000 testWay.wav
    
    Not al the conversation will be in ENglish. It could be in any of 22 distinct languages. The nly guarenteed english phrase, so to say, is TVS Way and the number. Is there any alternative to complete transcription? 90% of the conv cannot be recognised!
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2019-10-24
      
      I had actually used ffmpeg to specifically convert 8k into 16k files for this task. I have to dig nto why that is not working.
      
      It does not make file 16khz
      
      You need to use 8khz audio and try with this model https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Indian%20English/cmusphinx-en-in-8khz-5.2.tar.gz/download
      
      Overall, your task has to be solved differently.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hi, I had previously installed indian language, not much diff in recognition:
!pocketsphinx_continuous -infile file1.wav -hmm /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.cd_cont_5000 -dict /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.dic -samprate 16000.0 -kws_threshold 1e-20 -keyphrase "tea we" -time yes -logfn /dev/null

However, the sampling rate of 8Hz results in no recognition. COmmand line as always, works with 16Hz correctly. Python didnt work with either

tea we
tea we 2.620 2.810 0.827686
tea we tea we
tea we 4.830 5.030 0.835503
tea we 4.510 4.640 0.824051
tea we tea we tea we
tea we 8.500 9.350 0.848811
tea we 7.990 8.280 0.840953
tea we 7.230 7.390 0.826527
tea we tea we tea we tea we tea we
tea we 14.330 14.510 0.828017
tea we 13.980 14.220 0.826941
tea we 11.240 11.450 0.824711
tea we 10.890 11.080 0.833584
tea we 10.500 10.650 0.826114
tea we tea we tea we tea we
tea we 20.760 20.920 0.829343
tea we 20.090 20.270 0.825949
tea we 19.080 19.370 0.845845
tea we 18.600 18.800 0.840112
tea we
tea we 23.040 23.200 0.826197
tea we
tea we 25.050 25.760 0.898245
tea we tea we tea we
tea we 29.130 29.320 0.822898
tea we 28.050 28.310 0.854176
tea we 27.020 27.300 0.843311
tea we tea we
tea we 39.790 40.120 0.828680
tea we 39.350 39.620 0.840784
tea we tea we tea we
tea we 43.870 44.130 0.829177
tea we 43.640 43.760 0.828265
tea we 42.660 43.200 0.861468

I also tested the file with afinfo:

File:           file1.wav
File type ID:   WAVE
Num Tracks:     1
----
Data format:     1 ch,  16000 Hz, 'lpcm' (0x0000000C) 16-bit little-endian signed integer
                no channel layout.
estimated duration: 45.056000 sec
audio bytes: 1441792
audio packets: 720896
bit rate: 256000 bits per second
packet size upper bound: 2
maximum packet size: 2
audio data file offset: 78
optimized
source bit depth: I16

Last edit: kp ks 2019-10-24

PocketSphinx Diff in recognition of keyword between command line and python

Speech Recognition Toolkit

Forums

Help

PocketSphinx Diff in recognition of keyword between command line and python document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

PocketSphinx Diff in recognition of keyword between command line and python