I'm pretty new to pocketsphinx but was able to get a simple web app working that can send audio data from the browser over a websocket to be translated by a backend using pocketsphinx. I recently containerized it and since that process I have seen my hypothesis string from the decoder return non utf-8 characters. I can take that same code run the binary locally without fault. But once I run it in the container the translations go down the drain. I have not switched hardware, mics or browsers. The only change is where the backend runs, containerized or not. I have also played the audio data I receive in the container and the wav sounds just fine.
I don't even know where to begin to look or if anyone has seen anything similar. My translations in the container are "close". If I say "Hi, how are you?" the containerized version might decode it to "<non-utf-8 garbage="">&*~ are you". The non utf-8 characters are random as far as I can tell. I cannot share the entire code base but could share small snippets.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is a bindings problem. My laptop is running OSX where my deployment location is Linux. I've tried both CentOS and Ubuntu, on both Linux OSes I see the same non-utf8 characters. I'm currently parsing through dependencies and version differences from my Dev environment to Deploy. I've pulled the 5prealpha versions of sphinxbase and pocketsphinx and install both by running.
./autogen.sh
./configure
make
make install
My .pc files are nearly identical as well. I'll report back once I find out more. I now have confidence it's an install problem more than a code problem.
Last edit: Steve Louie 2018-02-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It turns out that the go binding is using unsafe pointers in its implementation. When Hyp is passed back, it is not thread-safe. You must deep copy the values or just not use it in a multi-threaded environment. That was where I was getting my occasional corrupted memory.
Last edit: Steve Louie 2018-02-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm pretty new to pocketsphinx but was able to get a simple web app working that can send audio data from the browser over a websocket to be translated by a backend using pocketsphinx. I recently containerized it and since that process I have seen my
hypothesis
string from the decoder return nonutf-8
characters. I can take that same code run the binary locally without fault. But once I run it in the container the translations go down the drain. I have not switched hardware, mics or browsers. The only change is where the backend runs, containerized or not. I have also played the audio data I receive in the container and the wav sounds just fine.I don't even know where to begin to look or if anyone has seen anything similar. My translations in the container are "close". If I say "Hi, how are you?" the containerized version might decode it to "<non-utf-8 garbage="">&*~ are you". The non utf-8 characters are random as far as I can tell. I cannot share the entire code base but could share small snippets.
Thanks!
Seems like corrupted memory, most likely an issue with the bindings.
You'd better localize the problem on the server with a simple reproducible example.
It is a bindings problem. My laptop is running OSX where my deployment location is Linux. I've tried both CentOS and Ubuntu, on both Linux OSes I see the same non-utf8 characters. I'm currently parsing through dependencies and version differences from my Dev environment to Deploy. I've pulled the 5prealpha versions of sphinxbase and pocketsphinx and install both by running.
My
.pc
files are nearly identical as well. I'll report back once I find out more. I now have confidence it's an install problem more than a code problem.Last edit: Steve Louie 2018-02-20
Was there a recent rebase on sphinxbase? My OSX laptop has
sphinxbase/HEAD-1b33160
and I cannot find that git sha to install on my linux environment.found it. https://github.com/cmusphinx/sphinxbase/pull/45
Last edit: Steve Louie 2018-02-20
It turns out that the go binding is using unsafe pointers in its implementation. When Hyp is passed back, it is not thread-safe. You must deep copy the values or just not use it in a multi-threaded environment. That was where I was getting my occasional corrupted memory.
Last edit: Steve Louie 2018-02-21