I guess something this basic was asked here before but I was not able to find it.
I want to use sphinx4 for some java application, but I cant figure out how to use it.
I've downloaded the package and etc..
If I am using the Streamspeechrecognizer on my recorded audio(instead of 10001-90210-01803) just like in the example, I am getting some random words.
of course i am speaking English there..
Thank you!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
the recorded audio? i am saying "1 2 3" there..
and the code is almost the same as in the example,
just the
recognizer.stopRecognition();
comes at the end.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have made 3 wav files, all are in the same format.
the first one is recognized perfectly(I am counting 1-10)
the second one(writing.wav) gives me random words
and the third(iwte_1.wav) gives no result at all
The file writing.wav doesn't have full 16khz bandwidth so it is not recognized properly. There are other bad things too like clipping which is still present, zero region on start which should not appear in normal recordings and so on.
Make sure you record audio with high quality microphone and you do not perform any lossy compression/decompression conversions to get best results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi I have a similar issue with my audio file, which is a automated telephone call being returned as the java string "shh", I am instead expecting the sentences said in the wav file to be returned in the string. Based on reading the previous posts I can assume that maybe it has to do with the bandwidth of my file?
Below is the information that I found by using the command sox --i myFile.wav, can you please advise me what the below values should be? (I assume they are incorrect)
As well, is there anything wrong with my configuration in the java code I provided below? I am using the Communicator_40.cd_cont_4000 due to my goal of using telephone audio.
#Configuration configuration = new Configuration();
// Set path to acoustic model.
configuration.setAcousticModelPath("file:Communicator_40.cd_cont_4000/");
//Set path to dictionary. configuration.setDictionaryPath("resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d");
//Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/language/en-us.lm.dmp");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
FileInputStream fis = new FileInputStream(new File("enbridgeGas.wav"));
recognizer.startRecognition(fis);
SpeechResult result = recognizer.getResult();
//recognizer.stopRecognition();
Then it will work for reasonable quality audio. Your audio quality is unfortunately below expectations, it is cut on bandwidth at 400 Hz and it is also heavily corrupted. I can suggest you to train a specific acoustic model to recognize such audio. A specialized language model can also help to improve accuracy. You can find additional details in our tutorial:
ok thank you, now it works better.
Why am I having java.lang.OutOfMemoryError: Java heap space error when my was file is more than 10 seconds?! Doesn't it enough -xms800 -xmx1500?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
ok thank you, now it works better.
Why am I having java.lang.OutOfMemoryError: Java heap space error when my was file is more than 10 seconds?! Doesn't it enough -xms800 -xmx1500?
The last question:
I need to use sphinx4 on my system and the recording must be from MOBILE MICROPHONE
I am recording mono and 16000hz so no compressions are needed.
The results are still almost random, what should I do with the file now? or the phone mic simply isn't good enough?
Hi,
I guess something this basic was asked here before but I was not able to find it.
I want to use sphinx4 for some java application, but I cant figure out how to use it.
I've downloaded the package and etc..
If I am using the Streamspeechrecognizer on my recorded audio(instead of 10001-90210-01803) just like in the example, I am getting some random words.
of course i am speaking English there..
Thank you!
An idea to share the file should have come to your mind.
the recorded audio? i am saying "1 2 3" there..
and the code is almost the same as in the example,
just the
recognizer.stopRecognition();
comes at the end.
Yes
You need to share the file.
Here is the file..
Your file is 44khz stereo, it must be 16khz mono audio file. You can resample it with sox.
Also, your file is clipped, you need to lower recording level.
Thank you! but now I am getting nothing at all, here is the code:
Make sure the recording is not clipped due to the high recording level. Share the file again.
I have made 3 wav files, all are in the same format.
the first one is recognized perfectly(I am counting 1-10)
the second one(writing.wav) gives me random words
and the third(iwte_1.wav) gives no result at all
Hello Kostya
Unfortunately you have uploaded only 2 files.
The format is not the only required factor. You can read our FAQ about audio bandwidth:
http://cmusphinx.sourceforge.net/wiki/faq#qwhat_is_sample_rate_and_how_does_it_affect_accuracy
The file writing.wav doesn't have full 16khz bandwidth so it is not recognized properly. There are other bad things too like clipping which is still present, zero region on start which should not appear in normal recordings and so on.
Make sure you record audio with high quality microphone and you do not perform any lossy compression/decompression conversions to get best results.
Hi I have a similar issue with my audio file, which is a automated telephone call being returned as the java string "shh", I am instead expecting the sentences said in the wav file to be returned in the string. Based on reading the previous posts I can assume that maybe it has to do with the bandwidth of my file?
Below is the information that I found by using the command sox --i myFile.wav, can you please advise me what the below values should be? (I assume they are incorrect)
As well, is there anything wrong with my configuration in the java code I provided below? I am using the Communicator_40.cd_cont_4000 due to my goal of using telephone audio.
Thanks,
Ron
sox --i myFile.wav:
Input File : 'enbridgeGas.wav'
Channels : 1
Sample Rate : 8000
Precision : 16-bit
Duration : 00:00:37.07 = 296567 samples ~ 2780.32 CDDA sectors
File Size : 593k
Bit Rate : 128k
Sample Encoding: 16-bit Signed Integer PCM
Code:
#public static void main(String[] args) throws Exception {
#Configuration configuration = new Configuration();
// Set path to acoustic model.
configuration.setAcousticModelPath("file:Communicator_40.cd_cont_4000/");
//Set path to dictionary. configuration.setDictionaryPath("resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d");
//Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/language/en-us.lm.dmp");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
FileInputStream fis = new FileInputStream(new File("enbridgeGas.wav"));
recognizer.startRecognition(fis);
SpeechResult result = recognizer.getResult();
//recognizer.stopRecognition();
System.out.println("test extracted: " + result.getHypothesis());
}
Last edit: ron 2014-12-07
Hello Ron
To recognize 8khz file you need to add
Then it will work for reasonable quality audio. Your audio quality is unfortunately below expectations, it is cut on bandwidth at 400 Hz and it is also heavily corrupted. I can suggest you to train a specific acoustic model to recognize such audio. A specialized language model can also help to improve accuracy. You can find additional details in our tutorial:
http://cmusphinx.sourceforge.net/wiki/tutorial
ok thank you, now it works better.
Why am I having java.lang.OutOfMemoryError: Java heap space error when my was file is more than 10 seconds?! Doesn't it enough -xms800 -xmx1500?
The specifier is in bytes, so if you mean 1.5G, you should write -Xmx1500M.
On Tue, Dec 9, 2014 at 3:58 AM, Kostya doppelgangerov@users.sf.net wrote:
--
Sincerely, Alexander
yeah I wrote it with m, my mistake here. 1500m isnt enough, what am I missing?
Obviously, if 1500m is not enough you should try 2G.
On Tue, Dec 9, 2014 at 6:08 AM, Kostya doppelgangerov@users.sf.net wrote:
--
Sincerely, Alexander
Thank you!
the length of the recording isn't problem now.
The last question:
I need to use sphinx4 on my system and the recording must be from MOBILE MICROPHONE
I am recording mono and 16000hz so no compressions are needed.
The results are still almost random, what should I do with the file now? or the phone mic simply isn't good enough?
Thank you!