I have just some basic familiarity with speech recognition. I'm using pocketsphinx0.8 with the acoustic model "en-us-8khz", a custom Language Model and the current dictionary is cmudict.0.6d , I have to transcribe telephonic audio files recorded with the British English accent, I don't need to transcribe in a live mode, basically I have some audio files and I want to transcribe them using pocketsphinx. My priority is not speed, but accuracy of transcription. I noticed that using a custom language model significantly improves transcription accuracy, but this is not enough
Now I am unsure about the approach to take to improve transcription accuracy in British English audio.
Is it a good idea to adapt from the en-us-8khz acoustic model using the audio+transcriptions (With British English accent) that I already have (with cmudict.0.6d/0.7)? (I have more than 30 minutes of audio+transcription)
or should I use some British English acoustic model? (the only one I could find is keith's, http://keithv.com/software/sphinx/uk/ , is that the only one available? any feedback about it? )
Which option is likely to give me the best improvement (in terms of accuracy and WER) in transcribing British English telephonic audio speech?
Also, It has been suggested in this forum that for British English it's best to download BEEP dictionary, but I noticed I can't just download it and convert letters to UPPERCASE, because the beep dictionary has some additional phones so it's not compatible with "en-us-8khz" acoustic model from my understanding, what are the steps I should take in order to use BEEP dictionary in my pocketsphinx audio transcriber program?
thanks for any help I might get, and let me know if anything wasn't clear
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is it a good idea to adapt from the en-us-8khz acoustic model using the audio+transcriptions (With British English accent) that I already have (with cmudict.0.6d/0.7)? (I have more than 30 minutes of audio+transcription)
Usually adaptation is not enough, you have to train a new model
or should I use some British English acoustic model? (the only one I could find is keith's, http://keithv.com/software/sphinx/uk/ , is that the only one available? any feedback about it? )
We can not suggest you a free model readily available unfortunately. Keith's models are not very good fit for telephone conversations.
what are the steps I should take in order to use BEEP dictionary in my pocketsphinx audio transcriber program?
You need to train your own acoustic model with the beep dictionary.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your audio is not really British English first of all, moreover, it is heavily corrupted by some mp4-like codec. I would try to use higher bitrate or avoid compression first. Or your will have to train your own model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is there any tools or libraries other than pocketsphinx and ispikit for Android ( https://github.com/ispikit/ispikit-android) to evaluate the speakers pronunciation?
I want user's correctly spelled each and every words (pronunciation) with percentage / rating (Good ,bad or avarage) / any acceptable method to evaluate the pronunciation.Plase help me.
Thank you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have just some basic familiarity with speech recognition. I'm using pocketsphinx0.8 with the acoustic model "en-us-8khz", a custom Language Model and the current dictionary is cmudict.0.6d , I have to transcribe telephonic audio files recorded with the British English accent, I don't need to transcribe in a live mode, basically I have some audio files and I want to transcribe them using pocketsphinx. My priority is not speed, but accuracy of transcription. I noticed that using a custom language model significantly improves transcription accuracy, but this is not enough
Now I am unsure about the approach to take to improve transcription accuracy in British English audio.
Is it a good idea to adapt from the en-us-8khz acoustic model using the audio+transcriptions (With British English accent) that I already have (with cmudict.0.6d/0.7)? (I have more than 30 minutes of audio+transcription)
or should I use some British English acoustic model? (the only one I could find is keith's, http://keithv.com/software/sphinx/uk/ , is that the only one available? any feedback about it? )
Which option is likely to give me the best improvement (in terms of accuracy and WER) in transcribing British English telephonic audio speech?
Also, It has been suggested in this forum that for British English it's best to download BEEP dictionary, but I noticed I can't just download it and convert letters to UPPERCASE, because the beep dictionary has some additional phones so it's not compatible with "en-us-8khz" acoustic model from my understanding, what are the steps I should take in order to use BEEP dictionary in my pocketsphinx audio transcriber program?
thanks for any help I might get, and let me know if anything wasn't clear
Usually adaptation is not enough, you have to train a new model
We can not suggest you a free model readily available unfortunately. Keith's models are not very good fit for telephone conversations.
You need to train your own acoustic model with the beep dictionary.
How to configure audio transcription function in
Android demo app https://github.com/cmusphinx/pocketsphinx-android-demo ?
My purpose is to select an existing audio file in android device and convert it in to text.
Please let me know if anything wasn't clear.
Last edit: Jeyamaal 2017-06-06
https://stackoverflow.com/questions/29008111/give-a-file-as-input-to-pocketsphinx-on-android
I followed the instrctions that mentioned above. I get the sample audio from http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/
But I got the output like this (not expected output) :
I attach here with my source file.
Could you please help me to solve??
Last edit: Nickolay V. Shmyrev 2017-06-30
You need to use real en-us.lm.bin not phonetic en-phone.dmp.
Below output is got it from Audio transcription.
What does the number(2) mean? ["are(2)" , "weren't(2)"]
Pronunciation variants as in https://cmusphinx.github.io/wiki/tutorialdict/
I tried to transcript my own speech file ,but I didn't get the expected out put.
Expected out put is "Hello How are you I'm fine How do you do"
But I got
I attach here with my audio file (It is wav format,16kHz sample rate).
Could you please help me to solve the problem ?
Your audio is not really British English first of all, moreover, it is heavily corrupted by some mp4-like codec. I would try to use higher bitrate or avoid compression first. Or your will have to train your own model.
How can you say "heavily corrupted by some mp4-like codec" ,and how did you check?any tools?
My file format is already in .wav format.
"avoid compression first" what is compression? How to avoid that?
Last edit: Jeyamaal 2017-07-22
Wavesurfer
https://en.wikipedia.org/wiki/Data_compression#Audio
Depends on how did you obtain your sample.
Hi,
Is there any tools or libraries other than pocketsphinx and ispikit for Android ( https://github.com/ispikit/ispikit-android) to evaluate the speakers pronunciation?
I want user's correctly spelled each and every words (pronunciation) with percentage / rating (Good ,bad or avarage) / any acceptable method to evaluate the pronunciation.Plase help me.
Thank you.