CMU Sphinx / Forums / Help: Need help getting started

SimonHGR - 2014-10-26

Hi all,

I apologize for asking something that I'm sure is written down somewhere, but I lack a basic map and am unclear where to even start looking.

I'm hoping to use Sphinx to make transcriptions of pre-recorded audio.

1) Is this possible?

2) Where do I start? I'm running Linux.

Thanks!
Simon

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-10-26
  
  1) Is this possible?
  
  Yes, given the audio has good quality and in US English
  
  2) Where do I start? I'm running Linux.
  
  Download and install latest sphinxbase and pocketsphinx from http://github.com/cmusphinx
  
  Download en-us generic acoustic model and en-us generic language model from downloads section on sourceforge
  
  Convert your audio file to 16khz 16bit mono file with ffmpeg
  
  ffmpeg -i file.mp3 -ar 16000 -ac 1 file.wav
  
  Transcribe it using pocketsphinx_continuous
  
  pocketsphinx_continuous -infile file.wav -hmm en-us -lm cmusphinx-5.0-en-us.lm.dmp > transcription.txt
  
  Do whatever you want with the results
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- SimonHGR - 2014-10-26
  
  I realize that I should perhaps add to this: I've donloaded sphinxbase, and was able to run ./configure; make; make install without any obvious errors. However, I see no binaries, no manual pages, nothing that would seem to suggest what to do next.
  
  I see there are other packages on SF, but I haven't been able to work out which, if any, I might need.
  
  TIA!
  Simon
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SimonHGR - 2014-10-26

Woops, overlapping post,sorry!

Thanks Nickolay, I'll go try that out.

Much appreciate the help,
Cheers,
Simon

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SimonHGR - 2014-10-26

OK, sorry, a couple more questions.

1) I find four files available for acoustic model download. Do I need all of them? One of them? How should I pick?

2) The language model archive contains a single file, en_us.lm, but the command line you quoted seems to suggest I should have a file cmusphinx-5.0-en-us.lm.dmp. Did I get the wrong thing? Am I making the wrong assumptions?

3) Related, I suspect to 1) and 2) where do I put the various models files? Dump them in the current directory? Install them somehow alongside the generated binaries?

4) I tried cloning the project from github, but autoconf.sh complained that it didn't have something a bunch of tools for autoconf, then something called swig, then about missing python.h. I had previously succeeded in building the version 0.8 from SourceForge, so I figured I'd proceed with that for the time being at least. They both appear to be version 0.8, so is there any reason to think this is a bad choice on my part?

5) (for later...) I'm actually trying to create transcripts of my own presentations, and I'm British born. I'm guessing that my accent won't match the models provided, but I don't see any British English models. Is there a direction I should go in trying to reduce the impact of this, or is it unlikely to matter?

Thanks again,
Cheers,
Simon

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-10-26
  
  1) I find four files available for acoustic model download. Do I need all of them? One of them? How should I pick?
  
  You need
  
  http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz/download
  
  2) The language model archive contains a single file, en_us.lm, but the command line you quoted seems to suggest I should have a file cmusphinx-5.0-en-us.lm.dmp. Did I get the wrong thing? Am I making the wrong assumptions?
  
  You need, it's lm.dmp:
  
  http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp/download
  
  3) Related, I suspect to 1) and 2) where do I put the various models files? Dump them in the current directory? Install them somehow alongside the generated binaries?
  
  It doesn't matter where to put them, it matters to properly point a path to them in the command line arguments. If you put models into current folder then the arguments must reflect that.
  
  4) I tried cloning the project from github, but autoconf.sh complained that it didn't have something a bunch of tools for autoconf, then something called swig, then about missing python.h. I had previously succeeded in building the version 0.8 from SourceForge, so I figured I'd proceed with that for the time being at least. They both appear to be version 0.8, so is there any reason to think this is a bad choice on my part?
  
  You must use a latest version from github. You can install python development package from your distribution to acquire python.h.
  
  5) (for later...) I'm actually trying to create transcripts of my own presentations, and I'm British born. I'm guessing that my accent won't match the models provided, but I don't see any British English models. Is there a direction I should go in trying to reduce the impact of this, or is it unlikely to matter?
  
  There are no public UK English models yet. If you are interesting in UK English you can build a corresponding model yourself.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SimonHGR - 2014-10-26

OK, I'm getting text. Not very accurate--maybe 30-50%, but clearly originated from what I had recorded. So I guess my next project will to work out the training thing and hope that can get the recognition to a usable point for my accent.

I will try to be a bit more self-supporting working out the training thing (I hope!)

Many thanks for your help Nikolay, it's much appreciated.
Cheers,
Simon

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-10-26
  
  It's recommended to share the audio you are trying to recognize to get help on the accuracy. Accent might not be the issue with your audio, there could be different cases.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Need help getting started

Speech Recognition Toolkit

Forums

Help

Need help getting started document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Need help getting started