tea we tea we
tea we 4.840 5.030 0.826032
tea we 4.510 4.680 0.835336
tea we
tea we 19.500 19.690 0.837511
tea we
tea we 39.550 39.700 0.822898
Example code for keyword recogntion:
#not matching command line outputimportsys,osfrompocketsphinximport*fromsphinxbaseimport*modeldir=get_model_path()datadir="../../../test/data"print(modeldir)# Create a decoder with certain modelconfig=Decoder.default_config()config.set_string('-hmm',os.path.join(modeldir,'en-us'))#config.set_string('-dict', os.path.join(modeldir, 'cmudict-en-us.dict'))config.set_string('-keyphrase','tea we')config.set_float('-kws_threshold',1e+20)# Open file to read the datastream=open("file1.wav","rb")# Alternatively you can read from microphone# import pyaudio# # p = pyaudio.PyAudio()# stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)# stream.start_stream()# Process audio chunk by chunk. On keyphrase detected perform action and restart searchdecoder=Decoder(config)decoder.start_utt()whileTrue:buf=stream.read(1024)ifbuf:decoder.process_raw(buf,False,False)else:breakifdecoder.hyp()!=None:print([(seg.word,seg.prob,seg.start_frame,seg.end_frame)forsegindecoder.seg()])print("Detected keyphrase, restarting search")decoder.end_utt()decoder.start_utt()
What am I setting up wrong? The Threshld is ie20 in both cases
ALso if you could point me to documentation to understand the output firmat from command line, I would be most grateful!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OOps! I changed it but there is still no recognition. WOuld u also help me understand the output in the command line? I assume it is the timestamp on the audio while between whiich the keyword is found and then the confidence level?
#not matching command line outputimportsys,osfrompocketsphinximport*fromsphinxbaseimport*modeldir=get_model_path()datadir="../../../test/data"print(modeldir)# Create a decoder with certain modelconfig=Decoder.default_config()config.set_string('-hmm',os.path.join(modeldir,'en-us'))#config.set_string('-dict', os.path.join(modeldir, 'cmudict-en-us.dict'))config.set_string('-keyphrase','tea we')config.set_float('-kws_threshold',1e-20)# Open file to read the datastream=open("file1.wav","rb")# Alternatively you can read from microphone# import pyaudio# # p = pyaudio.PyAudio()# stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)# stream.start_stream()# Process audio chunk by chunk. On keyphrase detected perform action and restart searchdecoder=Decoder(config)decoder.start_utt()whileTrue:buf=stream.read(1024)ifbuf:decoder.process_raw(buf,False,False)else:breakifdecoder.hyp()!=None:print([(seg.word,seg.prob,seg.start_frame,seg.end_frame)forsegindecoder.seg()])print("Detected keyphrase, restarting search")decoder.end_utt()decoder.start_utt()
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
WOuld u also help me understand the output in the command line? I assume it is the timestamp on the audio while between whiich the keyword is found and then the confidence level?
Yes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The file is 8khz, it is not going to work with 16khz model. Also you need en-in model, not en-us. Third, for such task it is better to use large vocabulary transcription and analyze the results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your insight.
I had actually used ffmpeg to specifically convert 8k into 16k files for this task. I have to dig nto why that is not working.
eg SYntax used for conversion:
Not al the conversation will be in ENglish. It could be in any of 22 distinct languages. The nly guarenteed english phrase, so to say, is TVS Way and the number. Is there any alternative to complete transcription? 90% of the conv cannot be recognised!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I had previously installed indian language, not much diff in recognition: !pocketsphinx_continuous -infile file1.wav -hmm /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.cd_cont_5000 -dict /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.dic -samprate 16000.0 -kws_threshold 1e-20 -keyphrase "tea we" -time yes -logfn /dev/null
However, the sampling rate of 8Hz results in no recognition. COmmand line as always, works with 16Hz correctly. Python didnt work with either
tea we
tea we 2.620 2.810 0.827686
tea we tea we
tea we 4.830 5.030 0.835503
tea we 4.510 4.640 0.824051
tea we tea we tea we
tea we 8.500 9.350 0.848811
tea we 7.990 8.280 0.840953
tea we 7.230 7.390 0.826527
tea we tea we tea we tea we tea we
tea we 14.330 14.510 0.828017
tea we 13.980 14.220 0.826941
tea we 11.240 11.450 0.824711
tea we 10.890 11.080 0.833584
tea we 10.500 10.650 0.826114
tea we tea we tea we tea we
tea we 20.760 20.920 0.829343
tea we 20.090 20.270 0.825949
tea we 19.080 19.370 0.845845
tea we 18.600 18.800 0.840112
tea we
tea we 23.040 23.200 0.826197
tea we
tea we 25.050 25.760 0.898245
tea we tea we tea we
tea we 29.130 29.320 0.822898
tea we 28.050 28.310 0.854176
tea we 27.020 27.300 0.843311
tea we tea we
tea we 39.790 40.120 0.828680
tea we 39.350 39.620 0.840784
tea we tea we tea we
tea we 43.870 44.130 0.829177
tea we 43.640 43.760 0.828265
tea we 42.660 43.200 0.861468
Hi,
I am using the sample code in Python to recognize recorded message. I get no recognition.
However if I try the same phrase from the command line, I get the following output:
Example code for keyword recogntion:
What am I setting up wrong? The Threshld is ie20 in both cases
ALso if you could point me to documentation to understand the output firmat from command line, I would be most grateful!
In command line you have threshold 1e-20, in python code 1e+20, those are different.
OOps! I changed it but there is still no recognition. WOuld u also help me understand the output in the command line? I assume it is the timestamp on the audio while between whiich the keyword is found and then the confidence level?
You can share the file to get help on this
Yes
The file for which the command line works is attached .
I am trying to recognise the phrase "TVS Way 20000"
The file is 8khz, it is not going to work with 16khz model. Also you need en-in model, not en-us. Third, for such task it is better to use large vocabulary transcription and analyze the results.
Thanks for your insight.
I had actually used ffmpeg to specifically convert 8k into 16k files for this task. I have to dig nto why that is not working.
eg SYntax used for conversion:
Not al the conversation will be in ENglish. It could be in any of 22 distinct languages. The nly guarenteed english phrase, so to say, is TVS Way and the number. Is there any alternative to complete transcription? 90% of the conv cannot be recognised!
It does not make file 16khz
You need to use 8khz audio and try with this model https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Indian%20English/cmusphinx-en-in-8khz-5.2.tar.gz/download
Overall, your task has to be solved differently.
Hi, I had previously installed indian language, not much diff in recognition:
!pocketsphinx_continuous -infile file1.wav -hmm /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.cd_cont_5000 -dict /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.dic -samprate 16000.0 -kws_threshold 1e-20 -keyphrase "tea we" -time yes -logfn /dev/nullHowever, the sampling rate of 8Hz results in no recognition. COmmand line as always, works with 16Hz correctly. Python didnt work with either
I also tested the file with afinfo:
Last edit: kp ks 2019-10-24