Menu

Help with Transcriber

Balaji
2017-12-31
2019-04-16
  • Balaji

    Balaji - 2017-12-31

    I am trying to transcribe text from a wav file of voices of people with speech disability.
    The output of (StreamSpeechRecognizerOBJECT.getResult()).getHypothesis() is very much different from the correct word. I am trying to make the recognizer adapt to my test data.

    Is there a way that I could look at the phonemes that are recognized by the Sphinx4 engine? I want to see something like this:
    K AH M AE N D
    for the word COMMAND

    I guess the recognizer will first get the phonemes and construct words from them. Am I right in my understanding? Please clarify.

    Thank you.

     

    Last edit: Balaji 2017-12-31
    • Nickolay V. Shmyrev

      Is there a way that I could look at the phonemes that are recognized by the Sphinx4 engine?

      There is no API for phonemes, you have to modify the source code to return them.

      I guess the recognizer will first get the phonemes and construct words from them. Am I right in my understanding? Please clarify.

      No, it does not work this way. The recognizer looks for the whole word sequences.

       
  • Balaji

    Balaji - 2018-01-03

    Thank you Mr. Nickolay.
    I have tried using the en_us model as in transcriberdemo.java and the adaptation method as described in CMU tutorial. Is it possible to create the acoustic model afresh from scratch. I have read theories of forward algorithm, forward - backward algorithm / baum-welch algorithm. Could you please give some pointers on this.

    Thank you.

     
  • Balaji

    Balaji - 2019-04-16

    Hello,

    In my setup, I transcribe about many .wav files at one shot. I am using code from Transcriber.java as given in the tutorial. I have included the recognizer code inside a loop which reads one .wav file at a time and pass it as a parameter to the recognizer. Something like this:

    for (String FileName:FileList) {
        InputStream stream = new FileInputStream(new File(FileName));
        if (stream != null) {
                        // Simple recognition with generic model
                        recognizer.startRecognition(stream);
                        ...
         }
    }
    

    I have 2 questions.
    1. Can I execute recognizer.startRecognition once outside the loop and pass each wave file as a parameter to the recognizer(using some function I guess)? My idea here is to start the recognizer only once (all the overheads will be done at once), so that the recognizer is continuously listening to some stream. I will send the .wav files one by one to this stream. Any performance improvement possible this way?

    1. Part of the output I have given below. In the Timer output, why is the dictionary loaded again and again. The "Load AM" count is only one but, Load Dictionary count is increasing.

    15:51:55.510 INFO speedTracker # ----------------------------- Timers-------------------------------
    15:51:55.510 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
    15:51:55.510 INFO speedTracker Load Dictionary 765 0.0040s 0.0000s 0.5360s 0.0093s 7.0820s
    15:51:55.510 INFO speedTracker ** Load AM 1** 3.5080s 3.5080s 3.5080s 3.5080s 3.5080s

    filler dictionary / noisedict is also loaded again and again. Any specific reason why?

    In short, I am trying to make my recognizer work faster in decoding many files. By modifying these I feel there may be some speed I can gain. Is it feasible.

    Thanks for your help.

    Balaji. (I have attached the output trace as a text file).

     

Log in to post a comment.