Menu

Hypothesis returns non utf-8

Help
2018-02-19
2018-02-21
  • Steve Louie

    Steve Louie - 2018-02-19

    I'm pretty new to pocketsphinx but was able to get a simple web app working that can send audio data from the browser over a websocket to be translated by a backend using pocketsphinx. I recently containerized it and since that process I have seen my hypothesis string from the decoder return non utf-8 characters. I can take that same code run the binary locally without fault. But once I run it in the container the translations go down the drain. I have not switched hardware, mics or browsers. The only change is where the backend runs, containerized or not. I have also played the audio data I receive in the container and the wav sounds just fine.

    I don't even know where to begin to look or if anyone has seen anything similar. My translations in the container are "close". If I say "Hi, how are you?" the containerized version might decode it to "<non-utf-8 garbage="">&*~ are you". The non utf-8 characters are random as far as I can tell. I cannot share the entire code base but could share small snippets.

    Thanks!

     
    • Nickolay V. Shmyrev

      The non utf-8 characters are random as far as I can tell.

      Seems like corrupted memory, most likely an issue with the bindings.

      I cannot share the entire code base but could share small snippets.

      You'd better localize the problem on the server with a simple reproducible example.

       
      • Steve Louie

        Steve Louie - 2018-02-20

        It is a bindings problem. My laptop is running OSX where my deployment location is Linux. I've tried both CentOS and Ubuntu, on both Linux OSes I see the same non-utf8 characters. I'm currently parsing through dependencies and version differences from my Dev environment to Deploy. I've pulled the 5prealpha versions of sphinxbase and pocketsphinx and install both by running.

        ./autogen.sh
        ./configure
        make
        make install
        

        My .pc files are nearly identical as well. I'll report back once I find out more. I now have confidence it's an install problem more than a code problem.

         

        Last edit: Steve Louie 2018-02-20
  • Steve Louie

    Steve Louie - 2018-02-20

    Was there a recent rebase on sphinxbase? My OSX laptop has sphinxbase/HEAD-1b33160 and I cannot find that git sha to install on my linux environment.


    found it. https://github.com/cmusphinx/sphinxbase/pull/45

     

    Last edit: Steve Louie 2018-02-20
  • Steve Louie

    Steve Louie - 2018-02-21

    It turns out that the go binding is using unsafe pointers in its implementation. When Hyp is passed back, it is not thread-safe. You must deep copy the values or just not use it in a multi-threaded environment. That was where I was getting my occasional corrupted memory.

     

    Last edit: Steve Louie 2018-02-21

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.