I realize that I should perhaps add to this: I've donloaded sphinxbase, and was able to run ./configure; make; make install without any obvious errors. However, I see no binaries, no manual pages, nothing that would seem to suggest what to do next.
I see there are other packages on SF, but I haven't been able to work out which, if any, I might need.
TIA!
Simon
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) I find four files available for acoustic model download. Do I need all of them? One of them? How should I pick?
2) The language model archive contains a single file, en_us.lm, but the command line you quoted seems to suggest I should have a file cmusphinx-5.0-en-us.lm.dmp. Did I get the wrong thing? Am I making the wrong assumptions?
3) Related, I suspect to 1) and 2) where do I put the various models files? Dump them in the current directory? Install them somehow alongside the generated binaries?
4) I tried cloning the project from github, but autoconf.sh complained that it didn't have something a bunch of tools for autoconf, then something called swig, then about missing python.h. I had previously succeeded in building the version 0.8 from SourceForge, so I figured I'd proceed with that for the time being at least. They both appear to be version 0.8, so is there any reason to think this is a bad choice on my part?
5) (for later...) I'm actually trying to create transcripts of my own presentations, and I'm British born. I'm guessing that my accent won't match the models provided, but I don't see any British English models. Is there a direction I should go in trying to reduce the impact of this, or is it unlikely to matter?
Thanks again,
Cheers,
Simon
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
2) The language model archive contains a single file, en_us.lm, but the command line you quoted seems to suggest I should have a file cmusphinx-5.0-en-us.lm.dmp. Did I get the wrong thing? Am I making the wrong assumptions?
3) Related, I suspect to 1) and 2) where do I put the various models files? Dump them in the current directory? Install them somehow alongside the generated binaries?
It doesn't matter where to put them, it matters to properly point a path to them in the command line arguments. If you put models into current folder then the arguments must reflect that.
4) I tried cloning the project from github, but autoconf.sh complained that it didn't have something a bunch of tools for autoconf, then something called swig, then about missing python.h. I had previously succeeded in building the version 0.8 from SourceForge, so I figured I'd proceed with that for the time being at least. They both appear to be version 0.8, so is there any reason to think this is a bad choice on my part?
You must use a latest version from github. You can install python development package from your distribution to acquire python.h.
5) (for later...) I'm actually trying to create transcripts of my own presentations, and I'm British born. I'm guessing that my accent won't match the models provided, but I don't see any British English models. Is there a direction I should go in trying to reduce the impact of this, or is it unlikely to matter?
There are no public UK English models yet. If you are interesting in UK English you can build a corresponding model yourself.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, I'm getting text. Not very accurate--maybe 30-50%, but clearly originated from what I had recorded. So I guess my next project will to work out the training thing and hope that can get the recognition to a usable point for my accent.
I will try to be a bit more self-supporting working out the training thing (I hope!)
Many thanks for your help Nikolay, it's much appreciated.
Cheers,
Simon
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It's recommended to share the audio you are trying to recognize to get help on the accuracy. Accent might not be the issue with your audio, there could be different cases.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
I apologize for asking something that I'm sure is written down somewhere, but I lack a basic map and am unclear where to even start looking.
I'm hoping to use Sphinx to make transcriptions of pre-recorded audio.
1) Is this possible?
2) Where do I start? I'm running Linux.
Thanks!
Simon
Yes, given the audio has good quality and in US English
Download and install latest sphinxbase and pocketsphinx from http://github.com/cmusphinx
Download en-us generic acoustic model and en-us generic language model from downloads section on sourceforge
Convert your audio file to 16khz 16bit mono file with ffmpeg
Transcribe it using pocketsphinx_continuous
Do whatever you want with the results
I realize that I should perhaps add to this: I've donloaded sphinxbase, and was able to run ./configure; make; make install without any obvious errors. However, I see no binaries, no manual pages, nothing that would seem to suggest what to do next.
I see there are other packages on SF, but I haven't been able to work out which, if any, I might need.
TIA!
Simon
Woops, overlapping post,sorry!
Thanks Nickolay, I'll go try that out.
Much appreciate the help,
Cheers,
Simon
OK, sorry, a couple more questions.
1) I find four files available for acoustic model download. Do I need all of them? One of them? How should I pick?
2) The language model archive contains a single file, en_us.lm, but the command line you quoted seems to suggest I should have a file cmusphinx-5.0-en-us.lm.dmp. Did I get the wrong thing? Am I making the wrong assumptions?
3) Related, I suspect to 1) and 2) where do I put the various models files? Dump them in the current directory? Install them somehow alongside the generated binaries?
4) I tried cloning the project from github, but autoconf.sh complained that it didn't have something a bunch of tools for autoconf, then something called swig, then about missing python.h. I had previously succeeded in building the version 0.8 from SourceForge, so I figured I'd proceed with that for the time being at least. They both appear to be version 0.8, so is there any reason to think this is a bad choice on my part?
5) (for later...) I'm actually trying to create transcripts of my own presentations, and I'm British born. I'm guessing that my accent won't match the models provided, but I don't see any British English models. Is there a direction I should go in trying to reduce the impact of this, or is it unlikely to matter?
Thanks again,
Cheers,
Simon
You need
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz/download
You need, it's lm.dmp:
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp/download
It doesn't matter where to put them, it matters to properly point a path to them in the command line arguments. If you put models into current folder then the arguments must reflect that.
You must use a latest version from github. You can install python development package from your distribution to acquire python.h.
There are no public UK English models yet. If you are interesting in UK English you can build a corresponding model yourself.
OK, I'm getting text. Not very accurate--maybe 30-50%, but clearly originated from what I had recorded. So I guess my next project will to work out the training thing and hope that can get the recognition to a usable point for my accent.
I will try to be a bit more self-supporting working out the training thing (I hope!)
Many thanks for your help Nikolay, it's much appreciated.
Cheers,
Simon
It's recommended to share the audio you are trying to recognize to get help on the accuracy. Accent might not be the issue with your audio, there could be different cases.