Menu

Pocketsphinx false positives

Help
unrated
2018-09-10
2018-09-11
  • unrated

    unrated - 2018-09-10

    I created my own model for yes and no in german language. Actually it works pretty well the only thing im confused about is when i speak other words than yes or no, which doesnt belong to my dictionary, i will get "yes" or "no" as a result. I guess sphinx4 could recognize unkown words with "<unk>" but in pocketsphinx it doesnt exist i guess. So i tried using kws and tuned the treshold. Now the result is much better since im not getting always "yes" or "no" when speaking other words. Anyway it happens too often. Sometimes it doesnt even recognize a "yes" or "no" at all.</unk>

    I tried to change the tresholds but im always facing two problems now:
    1. Change tresholds in one direction, results in less false postives but sadly also in less recognitions of yes/no
    2. Change tresholds in other direction results in more false positives but in 100% recognition of yes/no

    Can anyone give me some hints to improve my kws. Would it help when i train my acoustic language model with more data. I actually trained it with mabye 10 minutes of audio and its currently working ok with different speakers. But i want to avoid false alarms like i said and that is sadly not working pretty good. I think the tresholds are already tuned at its best.

    import sys, os
    import time
    from pocketsphinx.pocketsphinx import *
    from sphinxbase.sphinxbase import *
    
    modeldir = "./"
    datadir = "../../../test/data"
    
    # Create a decoder with certain model
    config = Decoder.default_config()
    config.set_string('-hmm', os.path.join(modeldir, 'germodel/model_parameters'))
    config.set_string('-dict', os.path.join(modeldir, 'germodel/de-de/cmusphinx-voxforge-de.dic'))
    config.set_string('-kws', os.path.join(modeldir, 'de-de/kws.txt'))
    config.set_string('-logfn', os.path.join(modeldir, '/dev/null'))
    
    # Open file to read the data
    #stream = open(os.path.join(datadir, "goforward.raw"), "rb")
    
    # Alternatively you can read from microphone
    import pyaudio
    # 
    p = pyaudio.PyAudio()
    stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
    stream.start_stream()
    
    # Process audio chunk by chunk. On keyphrase detected perform action and restart search
    decoder = Decoder(config)
    print('<<<<<<<<<<<<<<<<<<<<<<<<<<Start recognition>>>>>>>>>>>>>>>>>>>>>>>>>')
    decoder.start_utt()
    while True:
        buf = stream.read(1024)
        if buf:
             decoder.process_raw(buf, False, False)
        else:
             break
        hypothesis = decoder.hyp()
        if hypothesis != None:
            print ('Best hypothesis: ', time.time(), hypothesis.hypstr, " model score: ", hypothesis.best_score, " confidence: ", hypothesis.prob)
            decoder.end_utt()
            decoder.start_utt()
    

    kws.txt

    nein /750f/
    ja /120f/

     

    Last edit: unrated 2018-09-10
    • Nickolay V. Shmyrev

      Tutorial says that keyword must have 3-5 syllables. It will never work with short words like "nein".

      You need to rethink you application design. To get reasonable advice you need to describe what your application is.

       
  • unrated

    unrated - 2018-09-11

    The application asks a person if they feel good. If the person answers with yes nothing will happen. If the person answers with no an alarm is raised.

     
    • Nickolay V. Shmyrev

      You'd better use simple grammar recognition but you need to include other variants of answers into the grammar. Many people will say something like "amazing" and so on.

      Using speech recognition in emergency interfaces is a not a good idea overall.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.