CMU Sphinx / Forums / Help: British English telephonic audio transcription

Orest - 2014-12-18

I have just some basic familiarity with speech recognition. I'm using pocketsphinx0.8 with the acoustic model "en-us-8khz", a custom Language Model and the current dictionary is cmudict.0.6d , I have to transcribe telephonic audio files recorded with the British English accent, I don't need to transcribe in a live mode, basically I have some audio files and I want to transcribe them using pocketsphinx. My priority is not speed, but accuracy of transcription. I noticed that using a custom language model significantly improves transcription accuracy, but this is not enough

Now I am unsure about the approach to take to improve transcription accuracy in British English audio.

Is it a good idea to adapt from the en-us-8khz acoustic model using the audio+transcriptions (With British English accent) that I already have (with cmudict.0.6d/0.7)? (I have more than 30 minutes of audio+transcription)

or should I use some British English acoustic model? (the only one I could find is keith's, http://keithv.com/software/sphinx/uk/ , is that the only one available? any feedback about it? )

Which option is likely to give me the best improvement (in terms of accuracy and WER) in transcribing British English telephonic audio speech?

Also, It has been suggested in this forum that for British English it's best to download BEEP dictionary, but I noticed I can't just download it and convert letters to UPPERCASE, because the beep dictionary has some additional phones so it's not compatible with "en-us-8khz" acoustic model from my understanding, what are the steps I should take in order to use BEEP dictionary in my pocketsphinx audio transcriber program?

thanks for any help I might get, and let me know if anything wasn't clear
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-19
  
  Is it a good idea to adapt from the en-us-8khz acoustic model using the audio+transcriptions (With British English accent) that I already have (with cmudict.0.6d/0.7)? (I have more than 30 minutes of audio+transcription)
  
  Usually adaptation is not enough, you have to train a new model
  
  or should I use some British English acoustic model? (the only one I could find is keith's, http://keithv.com/software/sphinx/uk/ , is that the only one available? any feedback about it? )
  
  We can not suggest you a free model readily available unfortunately. Keith's models are not very good fit for telephone conversations.
  
  what are the steps I should take in order to use BEEP dictionary in my pocketsphinx audio transcriber program?
  
  You need to train your own acoustic model with the beep dictionary.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How to configure audio transcription function in
Android demo app https://github.com/cmusphinx/pocketsphinx-android-demo ?

My purpose is to select an existing audio file in android device and convert it in to text.

Please let me know if anything wasn't clear.

Last edit: Jeyamaal 2017-06-06

https://stackoverflow.com/questions/29008111/give-a-file-as-input-to-pocketsphinx-on-android

I followed the instrctions that mentioned above. I get the sample audio from http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/

But I got the output like this (not expected output) :

06-29 23:25:13.119 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search_fwdflat.c(963): fwdflat 0.45 wall 0.043 xRT
06-29 23:25:13.119 804-990/? I/QCNEJ: |CORE:COM:RCVR| CNE creating socket
06-29 23:25:13.149 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(1250): lattice start node <s>.0 end node </s>.942
06-29 23:25:13.149 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
06-29 23:25:13.149 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(1381): Lattice has 1024 nodes, 378 links
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ps_lattice.c(1380): Bestpath score: -4541
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:942:1054) = -247318
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ps_lattice.c(1441): Joint P(O,S) = -269531 P(S|O) = -22213
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(872): bestpath 0.04 CPU 0.004 xRT
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(875): bestpath 0.04 wall 0.003 xRT
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/System.out: <s>
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/System.out: <sil>
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/System.out: <sil>
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/System.out: [SPEECH]
06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/System.out: </s>
06-29 23:25:14.680 24001-24001/? I/Finsky: [1] com.google.android.finsky.hygiene.DailyHygiene$DailyHygieneService.onStartCommand(28): Beginning daily

I attach here with my source file.

Could you please help me to solve??

Last edit: Nickolay V. Shmyrev 2017-06-30

AudioTranscription.java

Nickolay V. Shmyrev - 2017-06-30

You need to use real en-us.lm.bin not phonetic en-phone.dmp.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Below output is got it from Audio transcription.

07-10 11:19:14.587 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: <s>
07-10 11:19:14.597 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: i
07-10 11:19:14.597 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: figure
07-10 11:19:14.597 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: those
07-10 11:19:14.597 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: are(2)
07-10 11:19:14.597 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: parameters
07-10 11:19:14.607 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: that
07-10 11:19:14.607 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: are(2)
07-10 11:19:14.607 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: measured
07-10 11:19:14.607 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: unknown
07-10 11:19:14.607 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: but
07-10 11:19:14.617 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: they
07-10 11:19:14.617 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: weren't(2)
07-10 11:19:14.617 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: <sil>
07-10 11:19:14.617 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: on(2)

What does the number(2) mean? ["are(2)" , "weren't(2)"]

Nickolay V. Shmyrev - 2017-07-10

Pronunciation variants as in https://cmusphinx.github.io/wiki/tutorialdict/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeyamaal - 2017-07-10

I tried to transcript my own speech file ,but I didn't get the expected out put.

Expected out put is "Hello How are you I'm fine How do you do"

But I got

07-10 22:43:20.980 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: <s> 07-10 22:43:21.000 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: though 07-10 22:43:21.010 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: the 07-10 22:43:21.010 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: whole 07-10 22:43:21.010 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: like 07-10 22:43:21.010 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: you 07-10 22:43:21.010 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: <sil> 07-10 22:43:21.020 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: <sil> 07-10 22:43:21.020 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: i'm 07-10 22:43:21.020 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: trying 07-10 22:43:21.020 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: <sil> 07-10 22:43:21.030 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: but 07-10 22:43:21.030 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: a(2) 07-10 22:43:21.030 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: dual(2) 07-10 22:43:21.030 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: </s>

I attach here with my audio file (It is wav format,16kHz sample rate).

Could you please help me to solve the problem ?

Recording_1.wav
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-07-10
  
  Your audio is not really British English first of all, moreover, it is heavily corrupted by some mp4-like codec. I would try to use higher bitrate or avoid compression first. Or your will have to train your own model.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Jeyamaal - 2017-07-22
    
    How can you say "heavily corrupted by some mp4-like codec" ,and how did you check?any tools?
    
    My file format is already in .wav format.
    
    "avoid compression first" what is compression? How to avoid that?
    
    Last edit: Jeyamaal 2017-07-22
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2017-07-24
      
      How can you say "heavily corrupted by some mp4-like codec" ,and how did you check?any tools?
      
      Wavesurfer
      
      "avoid compression first" what is compression?
      
      https://en.wikipedia.org/wiki/Data_compression#Audio
      
      How to avoid that?
      
      Depends on how did you obtain your sample.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeyamaal - 2017-08-19

Hi,

Is there any tools or libraries other than pocketsphinx and ispikit for Android ( https://github.com/ispikit/ispikit-android) to evaluate the speakers pronunciation?
I want user's correctly spelled each and every words (pronunciation) with percentage / rating (Good ,bad or avarage) / any acceptable method to evaluate the pronunciation.Plase help me.

Thank you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

British English telephonic audio transcription

Speech Recognition Toolkit

Forums

Help

British English telephonic audio transcription document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

British English telephonic audio transcription