#not matching command line outputimportsys,osfrompocketsphinximport*fromsphinxbaseimport*modeldir=get_model_path()datadir="../../../test/data"print(modeldir)# Create a decoder with certain modelconfig=Decoder.default_config()config.set_string('-hmm',os.path.join(modeldir,'en-us'))#config.set_string('-dict', os.path.join(modeldir, 'cmudict-en-us.dict'))config.set_string('-keyphrase','tea we')config.set_float('-kws_threshold',1e+20)# Open file to read the datastream=open("file1.wav","rb")# Alternatively you can read from microphone# import pyaudio# # p = pyaudio.PyAudio()# stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)# stream.start_stream()# Process audio chunk by chunk. On keyphrase detected perform action and restart searchdecoder=Decoder(config)decoder.start_utt()whileTrue:buf=stream.read(1024)ifbuf:decoder.process_raw(buf,False,False)else:breakifdecoder.hyp()!=None:print([(seg.word,seg.prob,seg.start_frame,seg.end_frame)forsegindecoder.seg()])print("Detected keyphrase, restarting search")decoder.end_utt()decoder.start_utt()
What am I setting up wrong? The Threshld is ie20 in both cases
ALso if you could point me to documentation to understand the output firmat from command line, I would be most grateful!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OOps! I changed it but there is still no recognition. WOuld u also help me understand the output in the command line? I assume it is the timestamp on the audio while between whiich the keyword is found and then the confidence level?
#not matching command line outputimportsys,osfrompocketsphinximport*fromsphinxbaseimport*modeldir=get_model_path()datadir="../../../test/data"print(modeldir)# Create a decoder with certain modelconfig=Decoder.default_config()config.set_string('-hmm',os.path.join(modeldir,'en-us'))#config.set_string('-dict', os.path.join(modeldir, 'cmudict-en-us.dict'))config.set_string('-keyphrase','tea we')config.set_float('-kws_threshold',1e-20)# Open file to read the datastream=open("file1.wav","rb")# Alternatively you can read from microphone# import pyaudio# # p = pyaudio.PyAudio()# stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)# stream.start_stream()# Process audio chunk by chunk. On keyphrase detected perform action and restart searchdecoder=Decoder(config)decoder.start_utt()whileTrue:buf=stream.read(1024)ifbuf:decoder.process_raw(buf,False,False)else:breakifdecoder.hyp()!=None:print([(seg.word,seg.prob,seg.start_frame,seg.end_frame)forsegindecoder.seg()])print("Detected keyphrase, restarting search")decoder.end_utt()decoder.start_utt()
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
WOuld u also help me understand the output in the command line? I assume it is the timestamp on the audio while between whiich the keyword is found and then the confidence level?
Yes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The file is 8khz, it is not going to work with 16khz model. Also you need en-in model, not en-us. Third, for such task it is better to use large vocabulary transcription and analyze the results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your insight.
I had actually used ffmpeg to specifically convert 8k into 16k files for this task. I have to dig nto why that is not working.
eg SYntax used for conversion:
ffmpeg-itestWav1.wav-ac1-ar16000testWay.wav
Not al the conversation will be in ENglish. It could be in any of 22 distinct languages. The nly guarenteed english phrase, so to say, is TVS Way and the number. Is there any alternative to complete transcription? 90% of the conv cannot be recognised!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I had previously installed indian language, not much diff in recognition: !pocketsphinx_continuous -infile file1.wav -hmm /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.cd_cont_5000 -dict /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.dic -samprate 16000.0 -kws_threshold 1e-20 -keyphrase "tea we" -time yes -logfn /dev/null
However, the sampling rate of 8Hz results in no recognition. COmmand line as always, works with 16Hz correctly. Python didnt work with either
Hi,
I am using the sample code in Python to recognize recorded message. I get no recognition.
However if I try the same phrase from the command line, I get the following output:
!pocketsphinx_continuous -infile file1.wav -hmm /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en-us -kws_threshold 1e-20 -keyphrase "tea we" -time yes -logfn /dev/null
Example code for keyword recogntion:
What am I setting up wrong? The Threshld is ie20 in both cases
ALso if you could point me to documentation to understand the output firmat from command line, I would be most grateful!
In command line you have threshold 1e-20, in python code 1e+20, those are different.
OOps! I changed it but there is still no recognition. WOuld u also help me understand the output in the command line? I assume it is the timestamp on the audio while between whiich the keyword is found and then the confidence level?
You can share the file to get help on this
Yes
The file for which the command line works is attached .
I am trying to recognise the phrase "TVS Way 20000"
The file is 8khz, it is not going to work with 16khz model. Also you need en-in model, not en-us. Third, for such task it is better to use large vocabulary transcription and analyze the results.
Thanks for your insight.
I had actually used ffmpeg to specifically convert 8k into 16k files for this task. I have to dig nto why that is not working.
eg SYntax used for conversion:
Not al the conversation will be in ENglish. It could be in any of 22 distinct languages. The nly guarenteed english phrase, so to say, is TVS Way and the number. Is there any alternative to complete transcription? 90% of the conv cannot be recognised!
It does not make file 16khz
You need to use 8khz audio and try with this model https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Indian%20English/cmusphinx-en-in-8khz-5.2.tar.gz/download
Overall, your task has to be solved differently.
Hi, I had previously installed indian language, not much diff in recognition:
!pocketsphinx_continuous -infile file1.wav -hmm /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.cd_cont_5000 -dict /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.dic -samprate 16000.0 -kws_threshold 1e-20 -keyphrase "tea we" -time yes -logfn /dev/null
However, the sampling rate of 8Hz results in no recognition. COmmand line as always, works with 16Hz correctly. Python didnt work with either
I also tested the file with afinfo:
Last edit: kp ks 2019-10-24