I have been using pocketsphinx for some time now and I am satisfied with it.
Formerly I was told that there were too much noise in my wav-files (see "Old vs new API" topic of mine) and I have to be prepared for uttering the words in a noisy environment, so I added filtering to my script.
vrft.py is capable of filtering the utterance ("filterit" variable) and writing the utterance to a wav-file ("towav" variable, cmd.wav) and the recognized text to recog.txt.
If I don't do any filtering (i.e. "filterit" is False, then "browser" gets recognized ("browser" is written to recog.txt).
By filtering I mean to remove frequences less than 300 and greater than 5000 with Fourier-transform.
Unfortunately if "filterit" is True, then the word "browser" doesn't get recognized.
So I set "filterit" to False and only create the cmd.wav with vrft.py, and use ft.py to do the filtering (writes the result to out.wav)
python ./vrft.py (filtered=False)
creates cmd.wav
I use new.py to print the recognized text to the screen with its values.
python ./ft.py cmd.wav
filters wave-file (removes frequencies <300 and >5000) writes the result to out.wav. Also shows waveforms and saves them to diag.png (I am a beginner in using matplotlib)
(1.the waveform in the time domain, 2. its Fourier-transform, 3. the filtered Fourier-transform, 4.the filtered waveform in the time domain).
The problem is that
python ./new.py out.wav gets recognized as "move":
('*Best hypothesis: ', 'move', ' model score: ', -4706, ' confidence: ', -11125)
instead of "browser"
As far as I know, maximum 8000Hz can be present in a signal with sampling rate of 16000Hz (Nyquist-Shannon frequency), so filtering the frequencies greater than 5000 doesn't seem t be a bad idea.
I listened to both cmd.wav and out.wav and the only difference in my opinion is that there is less noise in out.wav.
I know that this question is only related to pocketshinx remotly, but I have no idea why the noise-free out.wav doesn't get recognized.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you observe the spectrogram, the problem is that you are filtering out only the band [3kHz 5.5kHz] instead of [5kHz 8kHz]
Check your filtering mechanism again.
You can observe spectrogram using a tool such as Audacity. You can also apply any filter manually using the Effects -> Equalization mode in it for testing purpose.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for your answer.
I have just managed to check the spectrogram of out.wav with Audacity.
You are right: band [3kHz-5.5kHz] is filtered out.
I will check my filtering-code and will use Audacity regularly.
EDIT: the interesting thing is that the filtering seems to be ok (or better) in diag.png.
Maybe I made a mistake in the plotting-code too. :-)
Last edit: Robert Nagy 2015-10-15
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is also worth to note that cmusphinx models use band from 100hz to 6800hz, if you filter above 5000, accuracy will drop significantly. cmusphinx also does filtering internally, there is no need to filter before processing, it has no effect and just reduces the accuracy.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
I knew that Pocketsphinx does filtering, but I misunderstood your comment (topic: "Old vs new API"), where you said to reduce noise. I thought you meant to apply filtering.
Now it's obvious that you didn't mean that.
It's good to know that frequencies [100Hz-6800Hz] can't be filtered out.
Thank you again,
rob
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
By "reduce noise" I mean to use better recording hardware like better microphone. In software cmusphinx does everything possible already, you can not make it better.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have been using pocketsphinx for some time now and I am satisfied with it.
Formerly I was told that there were too much noise in my wav-files (see "Old vs new API" topic of mine) and I have to be prepared for uttering the words in a noisy environment, so I added filtering to my script.
vrft.py is capable of filtering the utterance ("filterit" variable) and writing the utterance to a wav-file ("towav" variable, cmd.wav) and the recognized text to recog.txt.
If I don't do any filtering (i.e. "filterit" is False, then "browser" gets recognized ("browser" is written to recog.txt).
By filtering I mean to remove frequences less than 300 and greater than 5000 with Fourier-transform.
Unfortunately if "filterit" is True, then the word "browser" doesn't get recognized.
So I set "filterit" to False and only create the cmd.wav with vrft.py, and use ft.py to do the filtering (writes the result to out.wav)
python ./vrft.py (filtered=False)
creates cmd.wav
I use new.py to print the recognized text to the screen with its values.
python ./new.py cmd.wav
('*Best hypothesis: ', 'browser', ' model score: ', -3267, ' confidence: ', -23245)
python ./ft.py cmd.wav
filters wave-file (removes frequencies <300 and >5000) writes the result to out.wav. Also shows waveforms and saves them to diag.png (I am a beginner in using matplotlib)
(1.the waveform in the time domain, 2. its Fourier-transform, 3. the filtered Fourier-transform, 4.the filtered waveform in the time domain).
The problem is that
python ./new.py out.wav gets recognized as "move":
('*Best hypothesis: ', 'move', ' model score: ', -4706, ' confidence: ', -11125)
instead of "browser"
As far as I know, maximum 8000Hz can be present in a signal with sampling rate of 16000Hz (Nyquist-Shannon frequency), so filtering the frequencies greater than 5000 doesn't seem t be a bad idea.
I listened to both cmd.wav and out.wav and the only difference in my opinion is that there is less noise in out.wav.
I know that this question is only related to pocketshinx remotly, but I have no idea why the noise-free out.wav doesn't get recognized.
Scripts wav-files and diagram
If you observe the spectrogram, the problem is that you are filtering out only the band [3kHz 5.5kHz] instead of [5kHz 8kHz]
Check your filtering mechanism again.
You can observe spectrogram using a tool such as Audacity. You can also apply any filter manually using the Effects -> Equalization mode in it for testing purpose.
Thank you for your answer.
I have just managed to check the spectrogram of out.wav with Audacity.
You are right: band [3kHz-5.5kHz] is filtered out.
I will check my filtering-code and will use Audacity regularly.
EDIT: the interesting thing is that the filtering seems to be ok (or better) in diag.png.
Maybe I made a mistake in the plotting-code too. :-)
Last edit: Robert Nagy 2015-10-15
It is also worth to note that cmusphinx models use band from 100hz to 6800hz, if you filter above 5000, accuracy will drop significantly. cmusphinx also does filtering internally, there is no need to filter before processing, it has no effect and just reduces the accuracy.
Hi Nickolay,
I knew that Pocketsphinx does filtering, but I misunderstood your comment (topic: "Old vs new API"), where you said to reduce noise. I thought you meant to apply filtering.
Now it's obvious that you didn't mean that.
It's good to know that frequencies [100Hz-6800Hz] can't be filtered out.
Thank you again,
rob
By "reduce noise" I mean to use better recording hardware like better microphone. In software cmusphinx does everything possible already, you can not make it better.