Menu

Need help getting started

Help
SimonHGR
2014-10-26
2014-10-26
  • SimonHGR

    SimonHGR - 2014-10-26

    Hi all,

    I apologize for asking something that I'm sure is written down somewhere, but I lack a basic map and am unclear where to even start looking.

    I'm hoping to use Sphinx to make transcriptions of pre-recorded audio.

    1) Is this possible?

    2) Where do I start? I'm running Linux.

    Thanks!
    Simon

     
    • Nickolay V. Shmyrev

      1) Is this possible?

      Yes, given the audio has good quality and in US English

      2) Where do I start? I'm running Linux.

      Download and install latest sphinxbase and pocketsphinx from http://github.com/cmusphinx

      Download en-us generic acoustic model and en-us generic language model from downloads section on sourceforge

      Convert your audio file to 16khz 16bit mono file with ffmpeg

          ffmpeg -i file.mp3 -ar 16000 -ac 1 file.wav
      

      Transcribe it using pocketsphinx_continuous

           pocketsphinx_continuous -infile file.wav -hmm en-us -lm cmusphinx-5.0-en-us.lm.dmp > transcription.txt
      

      Do whatever you want with the results

       
    • SimonHGR

      SimonHGR - 2014-10-26

      I realize that I should perhaps add to this: I've donloaded sphinxbase, and was able to run ./configure; make; make install without any obvious errors. However, I see no binaries, no manual pages, nothing that would seem to suggest what to do next.

      I see there are other packages on SF, but I haven't been able to work out which, if any, I might need.

      TIA!
      Simon

       
  • SimonHGR

    SimonHGR - 2014-10-26

    Woops, overlapping post,sorry!

    Thanks Nickolay, I'll go try that out.

    Much appreciate the help,
    Cheers,
    Simon

     
  • SimonHGR

    SimonHGR - 2014-10-26

    OK, sorry, a couple more questions.

    1) I find four files available for acoustic model download. Do I need all of them? One of them? How should I pick?

    2) The language model archive contains a single file, en_us.lm, but the command line you quoted seems to suggest I should have a file cmusphinx-5.0-en-us.lm.dmp. Did I get the wrong thing? Am I making the wrong assumptions?

    3) Related, I suspect to 1) and 2) where do I put the various models files? Dump them in the current directory? Install them somehow alongside the generated binaries?

    4) I tried cloning the project from github, but autoconf.sh complained that it didn't have something a bunch of tools for autoconf, then something called swig, then about missing python.h. I had previously succeeded in building the version 0.8 from SourceForge, so I figured I'd proceed with that for the time being at least. They both appear to be version 0.8, so is there any reason to think this is a bad choice on my part?

    5) (for later...) I'm actually trying to create transcripts of my own presentations, and I'm British born. I'm guessing that my accent won't match the models provided, but I don't see any British English models. Is there a direction I should go in trying to reduce the impact of this, or is it unlikely to matter?

    Thanks again,
    Cheers,
    Simon

     
    • Nickolay V. Shmyrev

      1) I find four files available for acoustic model download. Do I need all of them? One of them? How should I pick?

      You need

      http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz/download

      2) The language model archive contains a single file, en_us.lm, but the command line you quoted seems to suggest I should have a file cmusphinx-5.0-en-us.lm.dmp. Did I get the wrong thing? Am I making the wrong assumptions?

      You need, it's lm.dmp:

      http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp/download

      3) Related, I suspect to 1) and 2) where do I put the various models files? Dump them in the current directory? Install them somehow alongside the generated binaries?

      It doesn't matter where to put them, it matters to properly point a path to them in the command line arguments. If you put models into current folder then the arguments must reflect that.

      4) I tried cloning the project from github, but autoconf.sh complained that it didn't have something a bunch of tools for autoconf, then something called swig, then about missing python.h. I had previously succeeded in building the version 0.8 from SourceForge, so I figured I'd proceed with that for the time being at least. They both appear to be version 0.8, so is there any reason to think this is a bad choice on my part?

      You must use a latest version from github. You can install python development package from your distribution to acquire python.h.

      5) (for later...) I'm actually trying to create transcripts of my own presentations, and I'm British born. I'm guessing that my accent won't match the models provided, but I don't see any British English models. Is there a direction I should go in trying to reduce the impact of this, or is it unlikely to matter?

      There are no public UK English models yet. If you are interesting in UK English you can build a corresponding model yourself.

       
  • SimonHGR

    SimonHGR - 2014-10-26

    OK, I'm getting text. Not very accurate--maybe 30-50%, but clearly originated from what I had recorded. So I guess my next project will to work out the training thing and hope that can get the recognition to a usable point for my accent.

    I will try to be a bit more self-supporting working out the training thing (I hope!)

    Many thanks for your help Nikolay, it's much appreciated.
    Cheers,
    Simon

     
    • Nickolay V. Shmyrev

      It's recommended to share the audio you are trying to recognize to get help on the accuracy. Accent might not be the issue with your audio, there could be different cases.

       

Log in to post a comment.

MongoDB Logo MongoDB