Menu

British English telephonic audio transcription

Help
Orest
2014-12-18
2017-06-30
  • Orest

    Orest - 2014-12-18

    I have just some basic familiarity with speech recognition. I'm using pocketsphinx0.8 with the acoustic model "en-us-8khz", a custom Language Model and the current dictionary is cmudict.0.6d , I have to transcribe telephonic audio files recorded with the British English accent, I don't need to transcribe in a live mode, basically I have some audio files and I want to transcribe them using pocketsphinx. My priority is not speed, but accuracy of transcription. I noticed that using a custom language model significantly improves transcription accuracy, but this is not enough

    Now I am unsure about the approach to take to improve transcription accuracy in British English audio.

    • Is it a good idea to adapt from the en-us-8khz acoustic model using the audio+transcriptions (With British English accent) that I already have (with cmudict.0.6d/0.7)? (I have more than 30 minutes of audio+transcription)

    • or should I use some British English acoustic model? (the only one I could find is keith's, http://keithv.com/software/sphinx/uk/ , is that the only one available? any feedback about it? )

    Which option is likely to give me the best improvement (in terms of accuracy and WER) in transcribing British English telephonic audio speech?

    Also, It has been suggested in this forum that for British English it's best to download BEEP dictionary, but I noticed I can't just download it and convert letters to UPPERCASE, because the beep dictionary has some additional phones so it's not compatible with "en-us-8khz" acoustic model from my understanding, what are the steps I should take in order to use BEEP dictionary in my pocketsphinx audio transcriber program?

    thanks for any help I might get, and let me know if anything wasn't clear

     
    • Nickolay V. Shmyrev

      Is it a good idea to adapt from the en-us-8khz acoustic model using the audio+transcriptions (With British English accent) that I already have (with cmudict.0.6d/0.7)? (I have more than 30 minutes of audio+transcription)

      Usually adaptation is not enough, you have to train a new model

      or should I use some British English acoustic model? (the only one I could find is keith's, http://keithv.com/software/sphinx/uk/ , is that the only one available? any feedback about it? )

      We can not suggest you a free model readily available unfortunately. Keith's models are not very good fit for telephone conversations.

      what are the steps I should take in order to use BEEP dictionary in my pocketsphinx audio transcriber program?

      You need to train your own acoustic model with the beep dictionary.

       
  • Jeyamaal

    Jeyamaal - 2017-06-05

    How to configure audio transcription function in
    Android demo app https://github.com/cmusphinx/pocketsphinx-android-demo ?

    My purpose is to select an existing audio file in android device and convert it in to text.

    Please let me know if anything wasn't clear.

     

    Last edit: Jeyamaal 2017-06-06
    • Nickolay V. Shmyrev

       
      • Jeyamaal

        Jeyamaal - 2017-06-29

        I followed the instrctions that mentioned above. I get the sample audio from http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/

        But I got the output like this (not expected output) :

        06-29 23:25:13.119 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search_fwdflat.c(963): fwdflat 0.45 wall 0.043 xRT
        06-29 23:25:13.119 804-990/? I/QCNEJ: |CORE:COM:RCVR| CNE creating socket
        06-29 23:25:13.149 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(1250): lattice start node <s>.0 end node </s>.942
        06-29 23:25:13.149 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
        06-29 23:25:13.149 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(1381): Lattice has 1024 nodes, 378 links
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ps_lattice.c(1380): Bestpath score: -4541
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:942:1054) = -247318
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ps_lattice.c(1441): Joint P(O,S) = -269531 P(S|O) = -22213
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(872): bestpath 0.04 CPU 0.004 xRT
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(875): bestpath 0.04 wall 0.003 xRT
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/cmusphinx: INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/System.out: <s>
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/System.out: <sil>
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/System.out: <sil>
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/System.out: [SPEECH]
        06-29 23:25:13.159 25006-25006/edu.cmu.sphinx.pocketsphinx I/System.out: </s>
        06-29 23:25:14.680 24001-24001/? I/Finsky: [1] com.google.android.finsky.hygiene.DailyHygiene$DailyHygieneService.onStartCommand(28): Beginning daily
        

        I attach here with my source file.

        Could you please help me to solve??

         

        Last edit: Nickolay V. Shmyrev 2017-06-30
        • Nickolay V. Shmyrev

          You need to use real en-us.lm.bin not phonetic en-phone.dmp.

           
  • Jeyamaal

    Jeyamaal - 2017-07-10

    Below output is got it from Audio transcription.

    07-10 11:19:14.587 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: <s>
    07-10 11:19:14.597 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: i
    07-10 11:19:14.597 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: figure
    07-10 11:19:14.597 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: those
    07-10 11:19:14.597 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: are(2)
    07-10 11:19:14.597 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: parameters
    07-10 11:19:14.607 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: that
    07-10 11:19:14.607 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: are(2)
    07-10 11:19:14.607 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: measured
    07-10 11:19:14.607 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: unknown
    07-10 11:19:14.607 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: but
    07-10 11:19:14.617 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: they
    07-10 11:19:14.617 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: weren't(2)
    07-10 11:19:14.617 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: <sil>
    07-10 11:19:14.617 7217-7217/edu.cmu.sphinx.pocketsphinx I/System.out: on(2)
    

    What does the number(2) mean? ["are(2)" , "weren't(2)"]

     
  • Jeyamaal

    Jeyamaal - 2017-07-10

    I tried to transcript my own speech file ,but I didn't get the expected out put.

    Expected out put is "Hello How are you I'm fine How do you do"

    But I got

    07-10 22:43:20.980 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: <s>
    07-10 22:43:21.000 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: though
    07-10 22:43:21.010 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: the
    07-10 22:43:21.010 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: whole
    07-10 22:43:21.010 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: like
    07-10 22:43:21.010 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: you
    07-10 22:43:21.010 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: <sil>
    07-10 22:43:21.020 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: <sil>
    07-10 22:43:21.020 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: i'm
    07-10 22:43:21.020 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: trying
    07-10 22:43:21.020 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: <sil>
    07-10 22:43:21.030 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: but
    07-10 22:43:21.030 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: a(2)
    07-10 22:43:21.030 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: dual(2)
    07-10 22:43:21.030 18043-18043/edu.cmu.sphinx.pocketsphinx I/System.out: </s>
    

    I attach here with my audio file (It is wav format,16kHz sample rate).

    Could you please help me to solve the problem ?

     
    • Nickolay V. Shmyrev

      Your audio is not really British English first of all, moreover, it is heavily corrupted by some mp4-like codec. I would try to use higher bitrate or avoid compression first. Or your will have to train your own model.

       
      • Jeyamaal

        Jeyamaal - 2017-07-22

        How can you say "heavily corrupted by some mp4-like codec" ,and how did you check?any tools?

        My file format is already in .wav format.

        "avoid compression first" what is compression? How to avoid that?

         

        Last edit: Jeyamaal 2017-07-22
        • Nickolay V. Shmyrev

          How can you say "heavily corrupted by some mp4-like codec" ,and how did you check?any tools?

          Wavesurfer

          "avoid compression first" what is compression?

          https://en.wikipedia.org/wiki/Data_compression#Audio

          How to avoid that?

          Depends on how did you obtain your sample.

           
  • Jeyamaal

    Jeyamaal - 2017-08-19

    Hi,

    Is there any tools or libraries other than pocketsphinx and ispikit for Android ( https://github.com/ispikit/ispikit-android) to evaluate the speakers pronunciation?
    I want user's correctly spelled each and every words (pronunciation) with percentage / rating (Good ,bad or avarage) / any acceptable method to evaluate the pronunciation.Plase help me.

    Thank you.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.