CMU Sphinx / Forums / Help: NullPointer Exception (at AbstractScorer)

Kelly Anderson - 2009-05-02

Hello, I'm working on my graduation project and an important part of it is to transcribe documentaries (avi). I'm using Sphinx4 beta2 and the HUB4 acoustic and language models.

I took an avi documentary to test things, got 3 seconds that had speech (with background music), and converted them to wav 16khz mono. Now when I try that file on Sphinx here's what I get:

Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.decoder.scorer.AbstractScorer.startRecognition(AbstractScorer.java:116)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.startRecognition(WordPruningBreadthFirstSearchManager.java:234)
at edu.cmu.sphinx.decoder.Decoder.decode(Decoder.java:44)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:98)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:114)
at wavfile.WavFile.main(WavFile.java:55)

Can someone explain to me the meaning behind this error and what I should do to fix it?

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2009-05-02
  
  The issue is that you were using something like BatchCMN without NonSpeechDataFilter or something like that. Please try to reproduce the frontend pipeline from the transcriber demo.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2009-05-02
  
  Also just paste your files, I'll look.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Kelly Anderson - 2009-05-02
  
  Aha, it worked. I must've accidentally removed it or something.
  
  But now, I have a different problem...the transcription seems to stop and does not continue till the end of the wav file. I thought it had something to do with the values set by the SpeechMarker, but changing them achieved nothing since the person speaking didn't actually pause or anything.
  
  Here are my files:
  
  http://rapidshare.com/files/228340908/Files.rar.html
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2009-05-02
    
    You just need to invoke Recognizer.recognize in a loop until the result will be null to get the transcription for all chunks. See the Transcriber.java demo for example.
    
    // Loop unitl last utterance in the audio file has been decoded, in which case the recognizer will retu Result result; while ((result = recognizer.recognize())!= null) { String resultText = result.getBestResultNoFiller(); System.out.println(resultText); }
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Kelly Anderson - 2009-05-02
  
  It did recognize a few more words, but still most of the words did not get recognized at all, not even incorrectly. Is there something wrong with my wav files maybe?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2009-05-02
    
    Well, files with music require special treatment, can you please try on a clean recordings first?
    
    On what particular sample does it fail? elephant, national or something else? Could you please provide ready to run example that clearly reproduces the problem. Not just a collection of files you are using.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Kelly Anderson - 2009-05-02
  
  I also got this:
  
  WARNING threadedScorer Not enough data in frontend to start recognition
  
  Though there clearly was data still to left be recognized.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2009-05-02
    
    Don't care about this warning, this bug was fixed in svn trunk.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Kelly Anderson - 2009-05-02
  
  Clean recordings work pretty well.
  
  All three samples fail. What can I do to improve recognition with minimal background music on?
  
  Sorry I don't understand what you mean by a ready to run example...as in you need the WavFile.jar file? Yeah, I'll try to get it..since I'm working on Netbeans and don't know where/if it builds the jar files.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2009-05-02
    
    > Clean recordings work pretty well.
    
    Hm, then music is indeed a huge problem. To be honest, there is no ready to use receipt to handle that.
    
    Btw, are you converting the audio from stereo mp3 files? It should be easier to cleanup music from them although it will require some coding.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Kelly Anderson - 2009-05-03
      
      No, I directly extract wav files from avi videos using ffmpeg. Do you have any suggestions for what I could do to improve those wav files and improve the accuracy?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Nickolay V. Shmyrev - 2009-05-03
        
        I just meant that if files are stereo files it's possible to build noise cancellation that will effectively reduce music. With mono files it's much more complicated.
        
        I probably need some time to search for the music cancellation code. Also, can you please try wsj model instead of hub4. In theory it should be more resistant to noise.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Kelly Anderson - 2009-05-03
        
        Yes, they are stereo files, but I converted them to mono. I also canceled out one of the channels, and the results seemed to be a bit better actually.
        
        I will try the wsj, and hopefully it'll work out.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2009-05-06
        
        There are advanced techniques for source separation from stereo recording, they can help too. Otherwise it will be quite hard to get acceptable performance.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

NullPointer Exception (at AbstractScorer)

Speech Recognition Toolkit

Forums

Help

NullPointer Exception (at AbstractScorer) document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

NullPointer Exception (at AbstractScorer)