Hello, I'm working on my graduation project and an important part of it is to transcribe documentaries (avi). I'm using Sphinx4 beta2 and the HUB4 acoustic and language models.
I took an avi documentary to test things, got 3 seconds that had speech (with background music), and converted them to wav 16khz mono. Now when I try that file on Sphinx here's what I get:
Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.decoder.scorer.AbstractScorer.startRecognition(AbstractScorer.java:116)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.startRecognition(WordPruningBreadthFirstSearchManager.java:234)
at edu.cmu.sphinx.decoder.Decoder.decode(Decoder.java:44)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:98)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:114)
at wavfile.WavFile.main(WavFile.java:55)
Can someone explain to me the meaning behind this error and what I should do to fix it?
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The issue is that you were using something like BatchCMN without NonSpeechDataFilter or something like that. Please try to reproduce the frontend pipeline from the transcriber demo.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Aha, it worked. I must've accidentally removed it or something.
But now, I have a different problem...the transcription seems to stop and does not continue till the end of the wav file. I thought it had something to do with the values set by the SpeechMarker, but changing them achieved nothing since the person speaking didn't actually pause or anything.
You just need to invoke Recognizer.recognize in a loop until the result will be null to get the transcription for all chunks. See the Transcriber.java demo for example.
// Loop unitl last utterance in the audio file has been decoded, in which case the recognizer will retuResultresult;while((result=recognizer.recognize())!=null){StringresultText=result.getBestResultNoFiller();System.out.println(resultText);}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It did recognize a few more words, but still most of the words did not get recognized at all, not even incorrectly. Is there something wrong with my wav files maybe?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, files with music require special treatment, can you please try on a clean recordings first?
On what particular sample does it fail? elephant, national or something else? Could you please provide ready to run example that clearly reproduces the problem. Not just a collection of files you are using.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
All three samples fail. What can I do to improve recognition with minimal background music on?
Sorry I don't understand what you mean by a ready to run example...as in you need the WavFile.jar file? Yeah, I'll try to get it..since I'm working on Netbeans and don't know where/if it builds the jar files.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No, I directly extract wav files from avi videos using ffmpeg. Do you have any suggestions for what I could do to improve those wav files and improve the accuracy?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just meant that if files are stereo files it's possible to build noise cancellation that will effectively reduce music. With mono files it's much more complicated.
I probably need some time to search for the music cancellation code. Also, can you please try wsj model instead of hub4. In theory it should be more resistant to noise.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There are advanced techniques for source separation from stereo recording, they can help too. Otherwise it will be quite hard to get acceptable performance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello, I'm working on my graduation project and an important part of it is to transcribe documentaries (avi). I'm using Sphinx4 beta2 and the HUB4 acoustic and language models.
I took an avi documentary to test things, got 3 seconds that had speech (with background music), and converted them to wav 16khz mono. Now when I try that file on Sphinx here's what I get:
Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.decoder.scorer.AbstractScorer.startRecognition(AbstractScorer.java:116)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.startRecognition(WordPruningBreadthFirstSearchManager.java:234)
at edu.cmu.sphinx.decoder.Decoder.decode(Decoder.java:44)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:98)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:114)
at wavfile.WavFile.main(WavFile.java:55)
Can someone explain to me the meaning behind this error and what I should do to fix it?
Thanks.
The issue is that you were using something like BatchCMN without NonSpeechDataFilter or something like that. Please try to reproduce the frontend pipeline from the transcriber demo.
Also just paste your files, I'll look.
Aha, it worked. I must've accidentally removed it or something.
But now, I have a different problem...the transcription seems to stop and does not continue till the end of the wav file. I thought it had something to do with the values set by the SpeechMarker, but changing them achieved nothing since the person speaking didn't actually pause or anything.
Here are my files:
http://rapidshare.com/files/228340908/Files.rar.html
You just need to invoke Recognizer.recognize in a loop until the result will be null to get the transcription for all chunks. See the Transcriber.java demo for example.
It did recognize a few more words, but still most of the words did not get recognized at all, not even incorrectly. Is there something wrong with my wav files maybe?
Well, files with music require special treatment, can you please try on a clean recordings first?
On what particular sample does it fail? elephant, national or something else? Could you please provide ready to run example that clearly reproduces the problem. Not just a collection of files you are using.
I also got this:
WARNING threadedScorer Not enough data in frontend to start recognition
Though there clearly was data still to left be recognized.
Don't care about this warning, this bug was fixed in svn trunk.
Clean recordings work pretty well.
All three samples fail. What can I do to improve recognition with minimal background music on?
Sorry I don't understand what you mean by a ready to run example...as in you need the WavFile.jar file? Yeah, I'll try to get it..since I'm working on Netbeans and don't know where/if it builds the jar files.
> Clean recordings work pretty well.
Hm, then music is indeed a huge problem. To be honest, there is no ready to use receipt to handle that.
Btw, are you converting the audio from stereo mp3 files? It should be easier to cleanup music from them although it will require some coding.
No, I directly extract wav files from avi videos using ffmpeg. Do you have any suggestions for what I could do to improve those wav files and improve the accuracy?
I just meant that if files are stereo files it's possible to build noise cancellation that will effectively reduce music. With mono files it's much more complicated.
I probably need some time to search for the music cancellation code. Also, can you please try wsj model instead of hub4. In theory it should be more resistant to noise.
Yes, they are stereo files, but I converted them to mono. I also canceled out one of the channels, and the results seemed to be a bit better actually.
I will try the wsj, and hopefully it'll work out.
There are advanced techniques for source separation from stereo recording, they can help too. Otherwise it will be quite hard to get acceptable performance.