I created my own model for yes and no in german language. Actually it works pretty well the only thing im confused about is when i speak other words than yes or no, which doesnt belong to my dictionary, i will get "yes" or "no" as a result. I guess sphinx4 could recognize unkown words with "<unk>" but in pocketsphinx it doesnt exist i guess. So i tried using kws and tuned the treshold. Now the result is much better since im not getting always "yes" or "no" when speaking other words. Anyway it happens too often. Sometimes it doesnt even recognize a "yes" or "no" at all.</unk>
I tried to change the tresholds but im always facing two problems now:
1. Change tresholds in one direction, results in less false postives but sadly also in less recognitions of yes/no
2. Change tresholds in other direction results in more false positives but in 100% recognition of yes/no
Can anyone give me some hints to improve my kws. Would it help when i train my acoustic language model with more data. I actually trained it with mabye 10 minutes of audio and its currently working ok with different speakers. But i want to avoid false alarms like i said and that is sadly not working pretty good. I think the tresholds are already tuned at its best.
importsys,osimporttimefrompocketsphinx.pocketsphinximport*fromsphinxbase.sphinxbaseimport*modeldir="./"datadir="../../../test/data"# Create a decoder with certain modelconfig=Decoder.default_config()config.set_string('-hmm',os.path.join(modeldir,'germodel/model_parameters'))config.set_string('-dict',os.path.join(modeldir,'germodel/de-de/cmusphinx-voxforge-de.dic'))config.set_string('-kws',os.path.join(modeldir,'de-de/kws.txt'))config.set_string('-logfn',os.path.join(modeldir,'/dev/null'))# Open file to read the data#stream = open(os.path.join(datadir, "goforward.raw"), "rb")# Alternatively you can read from microphoneimportpyaudio# p=pyaudio.PyAudio()stream=p.open(format=pyaudio.paInt16,channels=1,rate=16000,input=True,frames_per_buffer=1024)stream.start_stream()# Process audio chunk by chunk. On keyphrase detected perform action and restart searchdecoder=Decoder(config)print('<<<<<<<<<<<<<<<<<<<<<<<<<<Start recognition>>>>>>>>>>>>>>>>>>>>>>>>>')decoder.start_utt()whileTrue:buf=stream.read(1024)ifbuf:decoder.process_raw(buf,False,False)else:breakhypothesis=decoder.hyp()ifhypothesis!=None:print('Best hypothesis: ',time.time(),hypothesis.hypstr," model score: ",hypothesis.best_score," confidence: ",hypothesis.prob)decoder.end_utt()decoder.start_utt()
You'd better use simple grammar recognition but you need to include other variants of answers into the grammar. Many people will say something like "amazing" and so on.
Using speech recognition in emergency interfaces is a not a good idea overall.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I created my own model for yes and no in german language. Actually it works pretty well the only thing im confused about is when i speak other words than yes or no, which doesnt belong to my dictionary, i will get "yes" or "no" as a result. I guess sphinx4 could recognize unkown words with "<unk>" but in pocketsphinx it doesnt exist i guess. So i tried using kws and tuned the treshold. Now the result is much better since im not getting always "yes" or "no" when speaking other words. Anyway it happens too often. Sometimes it doesnt even recognize a "yes" or "no" at all.</unk>
I tried to change the tresholds but im always facing two problems now:
1. Change tresholds in one direction, results in less false postives but sadly also in less recognitions of yes/no
2. Change tresholds in other direction results in more false positives but in 100% recognition of yes/no
Can anyone give me some hints to improve my kws. Would it help when i train my acoustic language model with more data. I actually trained it with mabye 10 minutes of audio and its currently working ok with different speakers. But i want to avoid false alarms like i said and that is sadly not working pretty good. I think the tresholds are already tuned at its best.
kws.txt
Last edit: unrated 2018-09-10
Tutorial says that keyword must have 3-5 syllables. It will never work with short words like "nein".
You need to rethink you application design. To get reasonable advice you need to describe what your application is.
The application asks a person if they feel good. If the person answers with yes nothing will happen. If the person answers with no an alarm is raised.
You'd better use simple grammar recognition but you need to include other variants of answers into the grammar. Many people will say something like "amazing" and so on.
Using speech recognition in emergency interfaces is a not a good idea overall.