I am using Speech recongnition python package which uses pocketsphinx. I am trying to understand the workings of the package. I followed the code as below:
In init.py file of speechrecognition we set the paths to acoustic model, language model and dictionary.
Later we the create the decoder object using
decoder = pocketsphinx.Decoder(config)
Then its running decoder.start_utt().
Checked pocketshinx.py and went through the code
The function is returning _pocketsphinx.Decoder_start_utt(self).
and found
_pocketsphinx = swig_import_helper()
But I am unable to find _pocketsphinx file. There is _pocketsphinx.a file but cannot open it.
Regards,
Pratik
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
Thank you for answering Pratik's query
In the init.py file for for speech reconizer package of python
I want to print the corresponding phonetic words of the words that are detected from the .wav file.
I require this to understand the corresponding phonetic translations of the words that are detected from the audio file.
Also it would be helpful if you can suggest a way for performing step-by-step debugging of the init.py in speech-recognizer package .
Can you explain what does goforward.raw file do (present in pockesphinx folder inside python site-packages)?
Regards,
Dattatreya
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
we are tryingto adapt to the existing acoustic model
We followed the article mentioned in CMUsphinx for adapting acoustic model
For this we downloaded some videos from youtube and converted them as to audio as per required format(16 bit 16khz ) using python package moviepy.
we divided 1 audio file into 30secs internval using python (moviepy)
then used the google api for getting the transcriptions.
and used the output to train the model
we had training videos of 3 1/2 hrs.
After training tested the with another audio from same speaker. But still the output wasn't satisfactory.
What could be the issue? Could it be beacause of the speed of the speech or something else?
Or beacause of the way the speaker is pronouncing a particular word ?
Eg:
Actual Transcription(Google API):
banks are required to pay interest deposits with banks are known as liability what is liability what is asset you have a friend who always gives you money he is your message you have a friend who always gives you money so always helps you he is your assets (Banking Awareness Lecture - Module 1_12)
Output using adapted model:
then so they live update this
the message that the banks are known as the elites
what these lady
what the stashed in you have a friend whole lot las vegas you whiny
he is service that
you have a friend who was initially use your money
well antunes gets you he is you as a
Note: we are runnig this command to test the output
(pocketwsphinx version is prealpha5)
We also tried running the above command using by the default values of hmm,lm and dict
(the files that are stored in /usr/local/share/pocketsphinx) but the result was the same
Regards,
Dattatreya
Last edit: Dattatreya 2016-10-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This video has music on background. It should be very bad for accuracy. And accent of course. You should clean music first with some kind of NMF algorithm, then submit to transcription.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
Can you please clarify about the accent problem?
How should we decide whether the speaker's accent is going to be a problem for training the acoustic model ?
Regards,
Dattatreya
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
I am following cmusphinx tutorial and i have added and replaced some words in the existing dictionary. For example,
1 W AH N
10 W AH N Z IY R OW
11 W AH N W AH N
11TH W AH N W AH N T IY EY CH
12 W AH N T UW
19 W AH N AY N
1939 W AH N AY N TH R IY N AY N
1946 W AH N AY N F OW R S IH K S
1984 W AH N AY N EY T F OW R
1985 W AH N AY N EY T F AY V
1988 W AH N AY N EY T EY T
1989 W AH N AY N EY T N AY N
1ST W AH N EH S T IY
2 T UW
25 T UW F AY V
3 TH R IY
39 TH R IY N AY N
4 F OW R
7000 S EH V AH N Z IY R OW Z IY R OW Z IY R OW
72 S EH V AH N T UW
7500 S EH V AH N F AY V Z IY R OW Z IY R OW
8 EY T
9 N AY N
i have replaced '8' with 'EIGHT'
output before replacement : YOU TO ALL IS THAT IS THAT IS THE MOTOR USE AUTOS YOU TO AS BY THE LORD WHO VEHICLE ACT. THE PROVISIONS OF CHAPTER 8 THAT IS REGARDING THE TP THE THE SHORT IS WAS IN THE EFFECT DEAL WITH EFFECT FROM JULY AMENDMENT IN WHAT IS THE THAT STILL TALKING LORD THE MOTOR VEHICLE THAT 19 IN
output after replacement : YOU TO ALL IS THAT IS THAT IS THE MOTOR USE AUTOS YOU TO AS BY THE LORD WHO VEHICLE ACT. THE PROVISIONS OF CHAPTER THE THAT IS REGARDING THE TP THE THE SHORT IS WAS IN THE EFFECT DEAL WITH EFFECT FROM JULY AMENDMENT IN WHAT IS THE THAT STILL TALKING LORD THE MOTOR VEHICLE THAT 19 IN
In the second output instead of 'THE' , 'EIGHT' should come. i am unable to add or replace words in the dictionary.
Thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am using Speech recongnition python package which uses pocketsphinx. I am trying to understand the workings of the package. I followed the code as below:
In init.py file of speechrecognition we set the paths to acoustic model, language model and dictionary.
Later we the create the decoder object using
decoder = pocketsphinx.Decoder(config)
Then its running decoder.start_utt().
Checked pocketshinx.py and went through the code
The function is returning _pocketsphinx.Decoder_start_utt(self).
and found
_pocketsphinx = swig_import_helper()
But I am unable to find _pocketsphinx file. There is _pocketsphinx.a file but cannot open it.
Regards,
Pratik
Pocketsphinx wraps API for Python using SWIG, you need to study more how swig works. Check http://swig.org/Doc3.0/Contents.html#Contents
Hi Nickolay,
Thank you for answering Pratik's query
In the init.py file for for speech reconizer package of python
I want to print the corresponding phonetic words of the words that are detected from the .wav file.
I require this to understand the corresponding phonetic translations of the words that are detected from the audio file.
Also it would be helpful if you can suggest a way for performing step-by-step debugging of the init.py in speech-recognizer package .
Can you explain what does goforward.raw file do (present in pockesphinx folder inside python site-packages)?
Regards,
Dattatreya
It is not possible, for performance reasons phonemes are ommitted in recognizer. You can only take them from the dictionary file.
Run under gdb
put breakpoint on any pocketsphinx function and proceed from there.
goforward.raw file is an audio data containing the recording of "go forward ten meters". You can listen it in audacity.
Hi Nickolay,
we are tryingto adapt to the existing acoustic model
We followed the article mentioned in CMUsphinx for adapting acoustic model
For this we downloaded some videos from youtube and converted them as to audio as per required format(16 bit 16khz ) using python package moviepy.
we divided 1 audio file into 30secs internval using python (moviepy)
then used the google api for getting the transcriptions.
and used the output to train the model
we had training videos of 3 1/2 hrs.
After training tested the with another audio from same speaker. But still the output wasn't satisfactory.
What could be the issue? Could it be beacause of the speed of the speech or something else?
Or beacause of the way the speaker is pronouncing a particular word ?
Eg:
Actual Transcription(Google API):
banks are required to pay interest deposits with banks are known as liability what is liability what is asset you have a friend who always gives you money he is your message you have a friend who always gives you money so always helps you he is your assets (Banking Awareness Lecture - Module 1_12)
Output using adapted model:
then so they live update this
the message that the banks are known as the elites
what these lady
what the stashed in you have a friend whole lot las vegas you whiny
he is service that
you have a friend who was initially use your money
well antunes gets you he is you as a
Note: we are runnig this command to test the output
(pocketwsphinx version is prealpha5)
pocketsphinx_continuous -hmm en-us1 -lm en-us.lm.bin -dict cmudict-en-us.dict -mllr mllr_matrix -infile /home/eight/Videos/Banking_Awareness_Lecture_Module_1_12_converted.wav
We also tried running the above command using by the default values of hmm,lm and dict
(the files that are stored in /usr/local/share/pocketsphinx) but the result was the same
Regards,
Dattatreya
Last edit: Dattatreya 2016-10-21
hi nickolay,
the link of the videos we downloaded for testing
https://www.youtube.com/watch?v=BZoIuv1kLh0&list=PLswCVWtC7kMT6Q-ZCjsJcHomJzgXQbCez
This video has music on background. It should be very bad for accuracy. And accent of course. You should clean music first with some kind of NMF algorithm, then submit to transcription.
Hi Nickolay,
Can you please clarify about the accent problem?
How should we decide whether the speaker's accent is going to be a problem for training the acoustic model ?
Regards,
Dattatreya
For training accent is not a problem given you have sufficient data. Adaptation is not going to work well for accents.
Thank you
Hi Nickolay,
I am following cmusphinx tutorial and i have added and replaced some words in the existing dictionary. For example,
1 W AH N
10 W AH N Z IY R OW
11 W AH N W AH N
11TH W AH N W AH N T IY EY CH
12 W AH N T UW
19 W AH N AY N
1939 W AH N AY N TH R IY N AY N
1946 W AH N AY N F OW R S IH K S
1984 W AH N AY N EY T F OW R
1985 W AH N AY N EY T F AY V
1988 W AH N AY N EY T EY T
1989 W AH N AY N EY T N AY N
1ST W AH N EH S T IY
2 T UW
25 T UW F AY V
3 TH R IY
39 TH R IY N AY N
4 F OW R
7000 S EH V AH N Z IY R OW Z IY R OW Z IY R OW
72 S EH V AH N T UW
7500 S EH V AH N F AY V Z IY R OW Z IY R OW
8 EY T
9 N AY N
i have replaced '8' with 'EIGHT'
output before replacement : YOU TO ALL IS THAT IS THAT IS THE MOTOR USE AUTOS YOU TO AS BY THE LORD WHO VEHICLE ACT. THE PROVISIONS OF CHAPTER 8 THAT IS REGARDING THE TP THE THE SHORT IS WAS IN THE EFFECT DEAL WITH EFFECT FROM JULY AMENDMENT IN WHAT IS THE THAT STILL TALKING LORD THE MOTOR VEHICLE THAT 19 IN
output after replacement : YOU TO ALL IS THAT IS THAT IS THE MOTOR USE AUTOS YOU TO AS BY THE LORD WHO VEHICLE ACT. THE PROVISIONS OF CHAPTER THE THAT IS REGARDING THE TP THE THE SHORT IS WAS IN THE EFFECT DEAL WITH EFFECT FROM JULY AMENDMENT IN WHAT IS THE THAT STILL TALKING LORD THE MOTOR VEHICLE THAT 19 IN
In the second output instead of 'THE' , 'EIGHT' should come. i am unable to add or replace words in the dictionary.
Thank you
You need to start a new thread to ask a new question.
okay, thank you.