I am trying to transcribe text from a wav file of voices of people with speech disability.
The output of (StreamSpeechRecognizerOBJECT.getResult()).getHypothesis() is very much different from the correct word. I am trying to make the recognizer adapt to my test data.
Is there a way that I could look at the phonemes that are recognized by the Sphinx4 engine? I want to see something like this: K AH M AE N D
for the word COMMAND
I guess the recognizer will first get the phonemes and construct words from them. Am I right in my understanding? Please clarify.
Thank you.
Last edit: Balaji 2017-12-31
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you Mr. Nickolay.
I have tried using the en_us model as in transcriberdemo.java and the adaptation method as described in CMU tutorial. Is it possible to create the acoustic model afresh from scratch. I have read theories of forward algorithm, forward - backward algorithm / baum-welch algorithm. Could you please give some pointers on this.
Thank you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In my setup, I transcribe about many .wav files at one shot. I am using code from Transcriber.java as given in the tutorial. I have included the recognizer code inside a loop which reads one .wav file at a time and pass it as a parameter to the recognizer. Something like this:
for(StringFileName:FileList){InputStreamstream=newFileInputStream(newFile(FileName));if(stream!=null){// Simple recognition with generic modelrecognizer.startRecognition(stream);...}}
I have 2 questions.
1. Can I execute recognizer.startRecognition once outside the loop and pass each wave file as a parameter to the recognizer(using some function I guess)? My idea here is to start the recognizer only once (all the overheads will be done at once), so that the recognizer is continuously listening to some stream. I will send the .wav files one by one to this stream. Any performance improvement possible this way?
Part of the output I have given below. In the Timer output, why is the dictionary loaded again and again. The "Load AM" count is only one but, Load Dictionary count is increasing.
15:51:55.510 INFO speedTracker # ----------------------------- Timers-------------------------------
15:51:55.510 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
15:51:55.510 INFO speedTracker Load Dictionary 765 0.0040s 0.0000s 0.5360s 0.0093s 7.0820s
15:51:55.510 INFO speedTracker ** Load AM 1** 3.5080s 3.5080s 3.5080s 3.5080s 3.5080s
filler dictionary / noisedict is also loaded again and again. Any specific reason why?
In short, I am trying to make my recognizer work faster in decoding many files. By modifying these I feel there may be some speed I can gain. Is it feasible.
Thanks for your help.
Balaji. (I have attached the output trace as a text file).
I am trying to transcribe text from a wav file of voices of people with speech disability.
The output of (StreamSpeechRecognizerOBJECT.getResult()).getHypothesis() is very much different from the correct word. I am trying to make the recognizer adapt to my test data.
Is there a way that I could look at the phonemes that are recognized by the Sphinx4 engine? I want to see something like this:
K AH M AE N D
for the word COMMAND
I guess the recognizer will first get the phonemes and construct words from them. Am I right in my understanding? Please clarify.
Thank you.
Last edit: Balaji 2017-12-31
There is no API for phonemes, you have to modify the source code to return them.
No, it does not work this way. The recognizer looks for the whole word sequences.
Thank you Mr. Nickolay.
I have tried using the en_us model as in transcriberdemo.java and the adaptation method as described in CMU tutorial. Is it possible to create the acoustic model afresh from scratch. I have read theories of forward algorithm, forward - backward algorithm / baum-welch algorithm. Could you please give some pointers on this.
Thank you.
Hello,
In my setup, I transcribe about many .wav files at one shot. I am using code from Transcriber.java as given in the tutorial. I have included the recognizer code inside a loop which reads one .wav file at a time and pass it as a parameter to the recognizer. Something like this:
I have 2 questions.
1. Can I execute recognizer.startRecognition once outside the loop and pass each wave file as a parameter to the recognizer(using some function I guess)? My idea here is to start the recognizer only once (all the overheads will be done at once), so that the recognizer is continuously listening to some stream. I will send the .wav files one by one to this stream. Any performance improvement possible this way?
filler dictionary / noisedict is also loaded again and again. Any specific reason why?
In short, I am trying to make my recognizer work faster in decoding many files. By modifying these I feel there may be some speed I can gain. Is it feasible.
Thanks for your help.
Balaji. (I have attached the output trace as a text file).
Nvidia recently announced audio recognition at speed 3500 times faster than realtime. You should better look on their job:
https://devblogs.nvidia.com/nvidia-accelerates-speech-text-transcription-3500x-kaldi/