Menu

Rejecting oov words

Help
2017-04-20
2018-10-30
  • Paulo Ferreira

    Paulo Ferreira - 2017-04-20

    Hello,
    I am using python and pocketsphinx to build a simple program in order to recognize just 4 words.
    This is my dictionary:

    DOWN    D AW N
    GO  G OW
    LEFT    L EH F T
    RIGHT   R AY T
    TURN    T ER N
    UP  AH P
    

    But if i say a word which is in the dictionary it works fine but when i say a random word pocketsphinx tries to find the closest match but i don't want it. Searching on the internet i realized that i need to use a keyword list in order to achieve this.
    So i built this one:

    DOWN /1e-15/
    RIGHT /1e-15/
    LEFT /1e-15/
    UP /1e-15/
    GO UP /1e-15/
    GO DOWN /1e-15/
    TURN LEFT /1e-15/
    TURN RIGHT /1e-15/
    

    But it doesn't seem to make any difference.
    Here is the pocketsphinx python code i have right now

    model_path = get_model_path()
    
    speech = LiveSpeech(
        verbose=False,
        sampling_rate=16000,
        buffer_size=2048,
        no_search=False,
        full_utt=False,
        hmm= os.path.join(model_path,'en-us'),
        lm= os.path.join(model_path,'drone.lm.bin'),
        kws=os.path.join(model_path, 'drone.list'),
        dic=os.path.join(model_path, 'drone.dict')
    )
    

    I used lmtool to build my language model and dictionary.
    What am i doing wrong?

     

    Last edit: Paulo Ferreira 2017-04-20
    • Nickolay V. Shmyrev

      keyword spotting and lm are exclusive, you should use either one or other.

      words like "up" are too short for keyword spotting, you can not listen for it continuously, you need longer keyphrase.

      Voice directions for drone is pretty artificial idea. They are slow to recognize and hard to execute. Almost useless in real life. You need to design more meaningful commands. "go to the base" is much more meaningful UX than "up up up down up".

       
  • Farhan Ahmad

    Farhan Ahmad - 2018-10-25

    Hi Nickolay, I am trying to make simple speech to text for commands like YES/NO and digits from Zero to Nine.
    I have trained my own acoustic model. The acuracy is 95%. But my problem is OOV words. Initially I was facing problem of mapping of each OOV word to in vocabulary words in pocketsphinx.
    From learning from many forums I tried keyword spotting in pocketsphinx. That didn't work for me as you suggested at one place that keyword spotting don't work well for very small phrases just like my case.
    Then I moved to sphinx4 to reject out of grammer words. And used simple grammer instead of language model. It is working fine for small audio files. When there is only single word spoken in the audio file. But when many words are spoken in audio file the code don't work well. It failed to diffrentiate between each word (failed to recognize YES NO correctly and write <unk> for each OOV word). I am using the same code as given on the cmu sphinx tutorial page (TranscriberDemo.java). Can you please help me!!!!</unk>

     
  • Farhan Ahmad

    Farhan Ahmad - 2018-10-29

    Thanks Nickolay, I had this as a second option in my mind.
    I have pretty much experience with cmu sphinx, and theoratical knowledge about speech recognition using HMMs. Neural Networks would be new for me, Can you suggest some places to start with (To get some theoratical base as well )?

     
  • Farhan Ahmad

    Farhan Ahmad - 2018-10-30

    Thanks Nickolay for your kindness!

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.