I have been a for days reading documentation. I am completely newbie on speech recognition and now I have some knowledge about how Sphinx 4 and speech recognition works.
I want to create a dictionary recognition program - that means, you say one word, system just recognize it.
The problem is that I am short of time. I have read a lot I am not sure where start. I would really appreciate if someone can guide me a little bit to start creating that program - What example may I use as base, what documentation should I read to customize it for that purpose.
Thanks in advance,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am making some progress, but I am not sure if this is the best way to do this.
I use as config base hellongram.config.xml, modified SimpleNGrammar by large.LargeTrigramModel and using models/language/wsj/wsj5kc.Z.DMP
All of other values are default ones from hellongram. Could anyone please say me how can I make some better speech recognition ? Normally the sound is going to be one word, can I use this to make recognition better ?
Thanks in advance,
Ruben Rubio Rey
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
How comprehensive is this dictionary intended to be?
Fundamentally, you need an acoustic model, a language model (or grammar for simple applications, hence my complexity question), and a dictionary file that maps each word represented in the language model/grammar to its set of acoustic phonemes.
Probably the best thing to do is to check out the <a href="http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/helloworld/README.html">Hello World!</a> and <a href="http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/hellongram/README.html">Hello NGram</a> demos in the /apps folder of your sphinx download. These demos recognize phrases contained in their grammars and echo them back. For any of the demo programs you play with, pay special attention to their config.xml files, which specify which models are used, the dictionary, etc.
Also make sure you read the <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/ProgrammersGuide.html">Programmer's Guide</a> and <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html">FAQ</a>.
Hopefully this helps. I can't and shouldn't really provide anything more concrete since I also am a sphinx noob. :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?
VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".
> In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?
> VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".
When you report about problems always try to provide:
Versions of the software you are using.
Ways to reproduce your problem, test case for example. The description "it doesn't work for me" never allows you help you.
Describe the expected results
Describe the results you want to get.
In particular, sphinx4-beta2 doesn't work with a microphone. If you are using nightly build it should be better.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> When you report about problems always try to provide:
Sorry!. You have the reason. I ll try to explain myself better.
> In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?
> 1. Versions of the software you are using:
sphinx4-1.0beta2
> 2. Ways to reproduce your problem, test case for example.
If you run any test, for example "HelloNGram.jar". Speak using the microphone. I am using two, the laptop built-in microphone and an external microphone that you can attach.
System recognize sentences in almost 0% of the cases. I tried with three people in two microphones, one of them native English speaker. The most of cases it match any of the words of the sentence (it use to be some output) but never the complete phrase.
> VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".
I am using voicedict as an code base because it is very similar at we needto do.
I had to modify a bit the java source code, due it was not compiling, but with very small problems. And I also modified it to just use the wordListRecognizer recognizer. (As system does not recognize me very well I cannot use the original "whats the meaning of" phrase.
> 2. Ways to reproduce your problem, test case for example. The description "it doesn't work for me" never allows you help you.
Program has three recognizers. The interesting one is wordListRecognizer that is which can recognize any word in the dictionary.
> 3. Describe the expected results
When you speak, you would like the system to recognize your voice. If say any word, that word must appear.
> 4. Describe the results you want to get.
Always empty string.
I really don't know how to fix any of the problems. Any king of help will be appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
java -jar VoiceDict.jar
main(); Couldn't start the recognizers.Property Exception component:'accuracyTracker' property:'null' - Can't instantiate class class edu.cmu.sphinx.instrumentation.AccuracyTracker
edu.cmu.sphinx.util.props.InternalConfigurationException: java.lang.InstantiationException
Exception in thread "main" java.lang.NullPointerException
at demo.sphinx.voicedict.VoiceDict.main(VoiceDict.java:155)
To go over it, comment <!-- <item>accuracyTracker </item>--> from monitors in the recognizers at voicedict.config.xml
Execute the program, and try to get working the recognizer "wordListRecognizer" (you have to say "what's the meaning of" and then after a beep the word)
> 3. Describe the expected results
The program using recognizer "wordListRecognizer" should be able to understand the word in a big dictionary.
> 4. Describe the results you want to get.
What I get is always an empty string.
Thanks in advance!!!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
To be honest the way voice-dictionary works makes me thing it will never produce reliable results. Is your goal to make it work or to implement some other thing?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So you want to write a telephone app that just uses simple grammars. You do not need sphinx. Google for voicexml or twilio. Both do exactly what you want from what I can tell.
If you still choose to use sphinx, you will need a pbx. A simple asterisk setup w/ sphinx can be found by searching for scribblej sphinx. If you plan on larger scale stuff, you will want some sort of MRCP such as Zanzibar that uses sphinx4.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is likely we may also try with the "Medium (1000 words - RM1)", we ll get better results, but we have to study if is enough. Anyway, if you could provide us any kind of way to try, we ll change between both dictionaries and test.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is not so complex, not pbx, not asteriisk. It is a custom application. It will only process wav files, and get the content. It is gonna be applied on telephone applications but the implementation it is clear. We only need the dictionary recognition. I ll have a look to voicexml.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
eliasmajic I really do not think we need voiceXML. The dictionary recognition is perfect for our purposes, feeding the application with a sound file, and returning the text.
I just need some guidance to achieve this goal.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have been a for days reading documentation. I am completely newbie on speech recognition and now I have some knowledge about how Sphinx 4 and speech recognition works.
I want to create a dictionary recognition program - that means, you say one word, system just recognize it.
The problem is that I am short of time. I have read a lot I am not sure where start. I would really appreciate if someone can guide me a little bit to start creating that program - What example may I use as base, what documentation should I read to customize it for that purpose.
Thanks in advance,
I am making some progress, but I am not sure if this is the best way to do this.
I use as config base hellongram.config.xml, modified SimpleNGrammar by large.LargeTrigramModel and using models/language/wsj/wsj5kc.Z.DMP
All of other values are default ones from hellongram. Could anyone please say me how can I make some better speech recognition ? Normally the sound is going to be one word, can I use this to make recognition better ?
Thanks in advance,
Ruben Rubio Rey
How comprehensive is this dictionary intended to be?
Fundamentally, you need an acoustic model, a language model (or grammar for simple applications, hence my complexity question), and a dictionary file that maps each word represented in the language model/grammar to its set of acoustic phonemes.
Probably the best thing to do is to check out the <a href="http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/helloworld/README.html">Hello World!</a> and <a href="http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/hellongram/README.html">Hello NGram</a> demos in the /apps folder of your sphinx download. These demos recognize phrases contained in their grammars and echo them back. For any of the demo programs you play with, pay special attention to their config.xml files, which specify which models are used, the dictionary, etc.
Also make sure you read the <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/ProgrammersGuide.html">Programmer's Guide</a> and <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html">FAQ</a>.
Hopefully this helps. I can't and shouldn't really provide anything more concrete since I also am a sphinx noob. :)
Apparently those HTML tags were not required. >.<
Thaks for you response !!! Finally I am making some progress.
I am working with voicedict http://personales.ya.com/javiercl/voicedict/index.html
I have two problems, both related.
In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?
VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".
Next lines are the config.xml. Thanks in advance!
<?xml version="1.0" encoding="UTF-8"?>
<!--
Sphinx-4 Configuration file
-->
<!-- ******** -->
<!-- an4 configuration file -->
<!-- ******** -->
<config>
</config>
> In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?
> VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".
When you report about problems always try to provide:
In particular, sphinx4-beta2 doesn't work with a microphone. If you are using nightly build it should be better.
> When you report about problems always try to provide:
Sorry!. You have the reason. I ll try to explain myself better.
> In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?
> 1. Versions of the software you are using:
sphinx4-1.0beta2
> 2. Ways to reproduce your problem, test case for example.
If you run any test, for example "HelloNGram.jar". Speak using the microphone. I am using two, the laptop built-in microphone and an external microphone that you can attach.
> 3. Describe the expected results
If speaking sentences that are in http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/hellongram/hellongram.test file, system should recognize the sentences in a % of the cases.
> 4. Describe the results you want to get.
System recognize sentences in almost 0% of the cases. I tried with three people in two microphones, one of them native English speaker. The most of cases it match any of the words of the sentence (it use to be some output) but never the complete phrase.
> VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".
I am using voicedict as an code base because it is very similar at we needto do.
> 1. Versions of the software you are using.
sphinx4-1.0beta2 and voicedict that you can download from here: http://personales.ya.com/javiercl/voicedict/index.html
I had to modify a bit the java source code, due it was not compiling, but with very small problems. And I also modified it to just use the wordListRecognizer recognizer. (As system does not recognize me very well I cannot use the original "whats the meaning of" phrase.
> 2. Ways to reproduce your problem, test case for example. The description "it doesn't work for me" never allows you help you.
Program has three recognizers. The interesting one is wordListRecognizer that is which can recognize any word in the dictionary.
> 3. Describe the expected results
When you speak, you would like the system to recognize your voice. If say any word, that word must appear.
> 4. Describe the results you want to get.
Always empty string.
I really don't know how to fix any of the problems. Any king of help will be appreciated.
> sphinx4-1.0beta2
So start with the upgrade to the nightly build/svn trunk
> sphinx4-1.0beta2
So start with the upgrade to the nightly build/svn trunk
> So start with the upgrade to the nightly build/svn trunk
So sweet! The demos works now!
Only the second problem left.
The main goal is to have a system that is able to understand any word in a big dictionary. To do so, I found a similar program called VoiceDict.
> 1. Versions of the software you are using:
svn code from a few hours ago
> 2. Ways to reproduce your problem
Download VoiceDict from http://personales.ya.com/javiercl/voicedict/index.html
If you execute jar, you will have this error:
java -jar VoiceDict.jar
main(); Couldn't start the recognizers.Property Exception component:'accuracyTracker' property:'null' - Can't instantiate class class edu.cmu.sphinx.instrumentation.AccuracyTracker
edu.cmu.sphinx.util.props.InternalConfigurationException: java.lang.InstantiationException
Exception in thread "main" java.lang.NullPointerException
at demo.sphinx.voicedict.VoiceDict.main(VoiceDict.java:155)
To go over it, comment <!-- <item>accuracyTracker </item>--> from monitors in the recognizers at voicedict.config.xml
Execute the program, and try to get working the recognizer "wordListRecognizer" (you have to say "what's the meaning of" and then after a beep the word)
> 3. Describe the expected results
The program using recognizer "wordListRecognizer" should be able to understand the word in a big dictionary.
> 4. Describe the results you want to get.
What I get is always an empty string.
Thanks in advance!!!
To be honest the way voice-dictionary works makes me thing it will never produce reliable results. Is your goal to make it work or to implement some other thing?
My goal is make dictionary work. I need to input some sound files, and output the text. Each sound file will be one word.
The accuracy will be 60% max on a 60k words, is it ok?
But anything else (with the demo examples and JavaDocs) I think I ll have no problem, but the dictionary recognition I don't know yet how to do it.
Should be enough (as higher, as better, but enough). Could you say me some details about how to do that ?
This program will be for a telephone application. is maybe possible to "transform" the original sound file before process it to get better results?
Thanks in advance!!
So you want to write a telephone app that just uses simple grammars. You do not need sphinx. Google for voicexml or twilio. Both do exactly what you want from what I can tell.
If you still choose to use sphinx, you will need a pbx. A simple asterisk setup w/ sphinx can be found by searching for scribblej sphinx. If you plan on larger scale stuff, you will want some sort of MRCP such as Zanzibar that uses sphinx4.
It is likely we may also try with the "Medium (1000 words - RM1)", we ll get better results, but we have to study if is enough. Anyway, if you could provide us any kind of way to try, we ll change between both dictionaries and test.
It is not so complex, not pbx, not asteriisk. It is a custom application. It will only process wav files, and get the content. It is gonna be applied on telephone applications but the implementation it is clear. We only need the dictionary recognition. I ll have a look to voicexml.
eliasmajic I really do not think we need voiceXML. The dictionary recognition is perfect for our purposes, feeding the application with a sound file, and returning the text.
I just need some guidance to achieve this goal.