CMU Sphinx / Forums / Sphinx4 Help: Streemspeechrecognizer

Kostya - 2014-12-05

Hi,

I guess something this basic was asked here before but I was not able to find it.
I want to use sphinx4 for some java application, but I cant figure out how to use it.

I've downloaded the package and etc..
If I am using the Streamspeechrecognizer on my recorded audio(instead of 10001-90210-01803) just like in the example, I am getting some random words.

of course i am speaking English there..

Thank you!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-05
  
  An idea to share the file should have come to your mind.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Kostya - 2014-12-05
    
    the recorded audio? i am saying "1 2 3" there..
    and the code is almost the same as in the example,
    just the
    recognizer.stopRecognition();
    comes at the end.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2014-12-05
      
      the recorded audio?
      
      Yes
      
      i am saying "1 2 3" there..
      
      You need to share the file.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kostya - 2014-12-05

Here is the file..

Untitled.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-05
  
  Your file is 44khz stereo, it must be 16khz mono audio file. You can resample it with sox.
  
  Also, your file is clipped, you need to lower recording level.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Kostya - 2014-12-05
    
    Thank you! but now I am getting nothing at all, here is the code:
    
    Configuration configuration = new Configuration(); configuration.setAcousticModelPath("models/acoustic/wsj"); configuration.setDictionaryPath("models/acoustic/wsj/dict/cmudict.0.6d"); configuration.setLanguageModelPath("models/language/en-us.lm.dmp"); StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration); recognizer.startRecognition(new URL("file:src/apps/edu/cmu/sphinx/demo/aligner/mono123.wav").openStream()); SpeechResult result = recognizer.getResult(); while ((result = recognizer.getResult()) != null) { System.out.println(result.getHypothesis()+" - "); } recognizer.stopRecognition();
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2014-12-07
      
      Make sure the recording is not clipped due to the high recording level. Share the file again.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Kostya - 2014-12-07
        
        I have made 3 wav files, all are in the same format.
        the first one is recognized perfectly(I am counting 1-10)
        the second one(writing.wav) gives me random words
        and the third(iwte_1.wav) gives no result at all
        
        iwte_1.wav
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Kostya - 2014-12-07
        
        writing.wav
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2014-12-07
        
        Hello Kostya
        
        Unfortunately you have uploaded only 2 files.
        
        The format is not the only required factor. You can read our FAQ about audio bandwidth:
        
        http://cmusphinx.sourceforge.net/wiki/faq#qwhat_is_sample_rate_and_how_does_it_affect_accuracy
        
        The file writing.wav doesn't have full 16khz bandwidth so it is not recognized properly. There are other bad things too like clipping which is still present, zero region on start which should not appear in normal recordings and so on.
        
        Make sure you record audio with high quality microphone and you do not perform any lossy compression/decompression conversions to get best results.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ron - 2014-12-07

Hi I have a similar issue with my audio file, which is a automated telephone call being returned as the java string "shh", I am instead expecting the sentences said in the wav file to be returned in the string. Based on reading the previous posts I can assume that maybe it has to do with the bandwidth of my file?

Below is the information that I found by using the command sox --i myFile.wav, can you please advise me what the below values should be? (I assume they are incorrect)

As well, is there anything wrong with my configuration in the java code I provided below? I am using the Communicator_40.cd_cont_4000 due to my goal of using telephone audio.

Thanks,
Ron

sox --i myFile.wav:

Input File : 'enbridgeGas.wav'
Channels : 1
Sample Rate : 8000
Precision : 16-bit
Duration : 00:00:37.07 = 296567 samples ~ 2780.32 CDDA sectors
File Size : 593k
Bit Rate : 128k
Sample Encoding: 16-bit Signed Integer PCM

Code:

#public static void main(String[] args) throws Exception {

#Configuration configuration = new Configuration();

// Set path to acoustic model.
configuration.setAcousticModelPath("file:Communicator_40.cd_cont_4000/");
//Set path to dictionary. configuration.setDictionaryPath("resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d");
//Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/language/en-us.lm.dmp");

StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);

FileInputStream fis = new FileInputStream(new File("enbridgeGas.wav"));
recognizer.startRecognition(fis);
SpeechResult result = recognizer.getResult();
//recognizer.stopRecognition();

System.out.println("test extracted: " + result.getHypothesis());

}

Last edit: ron 2014-12-07

enbridgeGas.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-07
  
  Hello Ron
  
  To recognize 8khz file you need to add
  
  configuration.setSampleRate(8000);
  
  Then it will work for reasonable quality audio. Your audio quality is unfortunately below expectations, it is cut on bandwidth at 400 Hz and it is also heavily corrupted. I can suggest you to train a specific acoustic model to recognize such audio. A specialized language model can also help to improve accuracy. You can find additional details in our tutorial:
  
  http://cmusphinx.sourceforge.net/wiki/tutorial
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kostya - 2014-12-08

ok thank you, now it works better.
Why am I having java.lang.OutOfMemoryError: Java heap space error when my was file is more than 10 seconds?! Doesn't it enough -xms800 -xmx1500?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Alexander Solovets - 2014-12-08
  
  The specifier is in bytes, so if you mean 1.5G, you should write -Xmx1500M.
  
  On Tue, Dec 9, 2014 at 3:58 AM, Kostya doppelgangerov@users.sf.net wrote:
  
  ok thank you, now it works better.
  Why am I having java.lang.OutOfMemoryError: Java heap space error when my was file is more than 10 seconds?! Doesn't it enough -xms800 -xmx1500?
  
  Streemspeechrecognizer
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cmusphinx/discussion/sphinx4/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  --
  Sincerely, Alexander
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Kostya - 2014-12-08
    
    yeah I wrote it with m, my mistake here. 1500m isnt enough, what am I missing?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Alexander Solovets - 2014-12-08
      
      Obviously, if 1500m is not enough you should try 2G.
      
      On Tue, Dec 9, 2014 at 6:08 AM, Kostya doppelgangerov@users.sf.net wrote:
      
      yeah I wrote it with m, my mistake here. 1500m isnt enough, what am I missing?
      
      Streemspeechrecognizer
      
      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cmusphinx/discussion/sphinx4/
      
      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
      
      --
      Sincerely, Alexander
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kostya - 2014-12-09

Thank you!

the length of the recording isn't problem now.

The last question:
I need to use sphinx4 on my system and the recording must be from MOBILE MICROPHONE
I am recording mono and 16000hz so no compressions are needed.
The results are still almost random, what should I do with the file now? or the phone mic simply isn't good enough?

Thank you!

Record12.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Streemspeechrecognizer

Speech Recognition Toolkit

Forums

Help

Streemspeechrecognizer document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Streemspeechrecognizer