Menu

Streemspeechrecognizer

Kostya
2014-12-05
2014-12-09
  • Kostya

    Kostya - 2014-12-05

    Hi,

    I guess something this basic was asked here before but I was not able to find it.
    I want to use sphinx4 for some java application, but I cant figure out how to use it.

    I've downloaded the package and etc..
    If I am using the Streamspeechrecognizer on my recorded audio(instead of 10001-90210-01803) just like in the example, I am getting some random words.

    of course i am speaking English there..

    Thank you!

     
    • Nickolay V. Shmyrev

      An idea to share the file should have come to your mind.

       
      • Kostya

        Kostya - 2014-12-05

        the recorded audio? i am saying "1 2 3" there..
        and the code is almost the same as in the example,
        just the
        recognizer.stopRecognition();
        comes at the end.

         
        • Nickolay V. Shmyrev

          the recorded audio?

          Yes

          i am saying "1 2 3" there..

          You need to share the file.

           
  • Kostya

    Kostya - 2014-12-05

    Here is the file..

     
    • Nickolay V. Shmyrev

      Your file is 44khz stereo, it must be 16khz mono audio file. You can resample it with sox.

      Also, your file is clipped, you need to lower recording level.

       
      • Kostya

        Kostya - 2014-12-05

        Thank you! but now I am getting nothing at all, here is the code:

                Configuration configuration = new Configuration();
        
                configuration.setAcousticModelPath("models/acoustic/wsj");
                configuration.setDictionaryPath("models/acoustic/wsj/dict/cmudict.0.6d");
                configuration.setLanguageModelPath("models/language/en-us.lm.dmp");
        
                StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
        
                recognizer.startRecognition(new URL("file:src/apps/edu/cmu/sphinx/demo/aligner/mono123.wav").openStream());
                SpeechResult result = recognizer.getResult();
        
        
        
        
                while ((result = recognizer.getResult()) != null) {
                    System.out.println(result.getHypothesis()+" - ");
        
                }
                        recognizer.stopRecognition();
        
         
        • Nickolay V. Shmyrev

          Make sure the recording is not clipped due to the high recording level. Share the file again.

           
          • Kostya

            Kostya - 2014-12-07

            I have made 3 wav files, all are in the same format.
            the first one is recognized perfectly(I am counting 1-10)
            the second one(writing.wav) gives me random words
            and the third(iwte_1.wav) gives no result at all

             
            • Kostya

              Kostya - 2014-12-07
               
              • Nickolay V. Shmyrev

                Hello Kostya

                Unfortunately you have uploaded only 2 files.

                The format is not the only required factor. You can read our FAQ about audio bandwidth:

                http://cmusphinx.sourceforge.net/wiki/faq#qwhat_is_sample_rate_and_how_does_it_affect_accuracy

                The file writing.wav doesn't have full 16khz bandwidth so it is not recognized properly. There are other bad things too like clipping which is still present, zero region on start which should not appear in normal recordings and so on.

                Make sure you record audio with high quality microphone and you do not perform any lossy compression/decompression conversions to get best results.

                 
  • ron

    ron - 2014-12-07

    Hi I have a similar issue with my audio file, which is a automated telephone call being returned as the java string "shh", I am instead expecting the sentences said in the wav file to be returned in the string. Based on reading the previous posts I can assume that maybe it has to do with the bandwidth of my file?

    Below is the information that I found by using the command sox --i myFile.wav, can you please advise me what the below values should be? (I assume they are incorrect)

    As well, is there anything wrong with my configuration in the java code I provided below? I am using the Communicator_40.cd_cont_4000 due to my goal of using telephone audio.

    Thanks,
    Ron

    sox --i myFile.wav:

    Input File : 'enbridgeGas.wav'
    Channels : 1
    Sample Rate : 8000
    Precision : 16-bit
    Duration : 00:00:37.07 = 296567 samples ~ 2780.32 CDDA sectors
    File Size : 593k
    Bit Rate : 128k
    Sample Encoding: 16-bit Signed Integer PCM

    Code:

    #public static void main(String[] args) throws Exception {

    #Configuration configuration = new Configuration();

    // Set path to acoustic model.
    configuration.setAcousticModelPath("file:Communicator_40.cd_cont_4000/");
    //Set path to dictionary. configuration.setDictionaryPath("resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d");
    //Set language model.
    configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/language/en-us.lm.dmp");

    StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);

    FileInputStream fis = new FileInputStream(new File("enbridgeGas.wav"));
    recognizer.startRecognition(fis);
    SpeechResult result = recognizer.getResult();
    //recognizer.stopRecognition();

    System.out.println("test extracted: " + result.getHypothesis());

    }

     

    Last edit: ron 2014-12-07
    • Nickolay V. Shmyrev

      Hello Ron

      To recognize 8khz file you need to add

         configuration.setSampleRate(8000);
      

      Then it will work for reasonable quality audio. Your audio quality is unfortunately below expectations, it is cut on bandwidth at 400 Hz and it is also heavily corrupted. I can suggest you to train a specific acoustic model to recognize such audio. A specialized language model can also help to improve accuracy. You can find additional details in our tutorial:

      http://cmusphinx.sourceforge.net/wiki/tutorial

       
  • Kostya

    Kostya - 2014-12-08

    ok thank you, now it works better.
    Why am I having java.lang.OutOfMemoryError: Java heap space error when my was file is more than 10 seconds?! Doesn't it enough -xms800 -xmx1500?

     
    • Alexander Solovets

      The specifier is in bytes, so if you mean 1.5G, you should write -Xmx1500M.

      On Tue, Dec 9, 2014 at 3:58 AM, Kostya doppelgangerov@users.sf.net wrote:

      ok thank you, now it works better.
      Why am I having java.lang.OutOfMemoryError: Java heap space error when my was file is more than 10 seconds?! Doesn't it enough -xms800 -xmx1500?


      Streemspeechrecognizer


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cmusphinx/discussion/sphinx4/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

      --
      Sincerely, Alexander

       
      • Kostya

        Kostya - 2014-12-08

        yeah I wrote it with m, my mistake here. 1500m isnt enough, what am I missing?

         
  • Kostya

    Kostya - 2014-12-09

    Thank you!

    the length of the recording isn't problem now.

    The last question:
    I need to use sphinx4 on my system and the recording must be from MOBILE MICROPHONE
    I am recording mono and 16000hz so no compressions are needed.
    The results are still almost random, what should I do with the file now? or the phone mic simply isn't good enough?

    Thank you!

     

Log in to post a comment.

MongoDB Logo MongoDB