Menu

Errors while normalizing during sphinx train

Help
Kishore
2009-03-30
2012-09-22
  • Kishore

    Kishore - 2009-03-30

    Hi,

    I am running a training session using SphinxTrain. The process seems to be completing successfully since I get the message at the end in the appropriate log file: "Likelihoods have converged! Baum Welch training completed!
    ".

    However when I look into log files, I notice errors like so during the training:
    ERROR: "........\src\libs\libmodinv\gauden.c", line 1700: var (mgau= 852, feat= 0, density=3, component=0) < 0

    Not sure if I can ignore these errors. Can anyone throw light on this?

    Thanks,
    Kishore

     
    • Kishore

      Kishore - 2009-04-04

      Thanks a lot Nickolay for your response. Please find my response below.

      >> binlm2arpa should also work. What particular problems do you have with this application?

      I am getting the following error, if I run the tool for example on wsj5kc.Z.DMP:

      D:\lmsphinx>binlm2arpa -binary wsj5kc.Z.DMP -arpa wsj5kc.lm
      Reading binary language model from wsj5kc.Z.DMP...Error : Language model file ws
      j5kc.Z.DMP appears to be corrupted.

       
      • Nickolay V. Shmyrev

        Hm, cmuclmtk binary reading/writing code is in broken state. Please use sphinx3_lm_convert instead.

         
      • Nickolay V. Shmyrev

        Hm, it seems that cmuclmtk appears to be broken. Please use sphinx3_lm_convert until it will be repaired.

         
    • Nickolay V. Shmyrev

       
      • Kishore

        Kishore - 2009-03-31

        Thanks Nickolay. Despite the errors the acoustic model generated by SphinxTrain is working well with regards to speech recognition.

         
    • Kishore

      Kishore - 2009-04-01

      Hi Nickolay,

      I am evaluating Sphinx4 to replace SAPI based codebase. Sphinx4 has been very impressive so far. I am right now conducting a small experiment with SphinxTrain and Sphinx:
      1. Use a TTS application with At&T natural voices (16kHz and 16 bit) to read a text file and encode it in 16kHz and 16 bit wav file format.
      2. Build an acoustic model based on the wave file generated in step 1.
      3. Package the acoustic model into a jar file and use it for decoding in Sphinx4.
      4. So far so good. No problems till step 3.

      However if I generate another TTS wave file using just a subset of words with which acoustic model has been trained and try to decode it using the same acoustic model, I am getting inaccurate recognition. (in fact the words recognised are completely different sequence from what has been encoded, but the words are still a subset of those defined in grammar/dictionary).

      Also I tried out generating the same wav file content (i.e., the speech is same) as the one generated by TTS in step 1 and I tried to use it with th,e acoustic model packaged in step 3, the recognition also is totally inaccurate.

      Thanks in advance for your reply,
      Kishore

       
      • Nickolay V. Shmyrev

        > right now conducting a small experiment with SphinxTrain and Sphinx

        As stated before you have not enough data to train the model. Also, to recognize English I suggest you to use existing models instead of trying to shoot yourself in the leg with the invented wheel.

         
    • Kishore

      Kishore - 2009-04-01

      Nickolay:

      As we are trying to use speech recognition in engineering field, our intention is to extend the WSJ or anyother existing suitable model by adding technical vocabulary. Hence we are planning to do the following. Please let me know if it makes sense.

      1) Take the WSJ dictionary and add needed technical vocabulary.
      2) Create wave files from text based on the comprehensive dictionary created from step #1. For this step we are plannig to use text-to-speech convertor.
      3) With the above inputs we would like to generate an extended version of WSJ acoustic model, which we hope will support our technical vocabulary.

      we are aware of the possibility of extending WSJ by using Addenda functionality, however we need a process which will enable us to add thousands of engineering vocabulary.

      Hope this explains our intentions clearly and we hope to hear valuable input on this approach from you.

      thanks
      Kishore

       
      • Nickolay V. Shmyrev

        I don't get why do you want to train a new acoustic model moreover I don't think it's a good idea to use TTS output for training. To extend the dictionary to the area you want you need the following:

        1) Extend the cmudict (please note that there is no such thing as WSJ dictionary) with the pronuciations of missing words. You could use Sequitur g2p for example to do this automatically. Though the result will require manual review from the speech expert.

        2) Collect a lot of text from your target domain and train a language model with cmuclmtk.

        That's all.

         
    • Kishore

      Kishore - 2009-04-02

      Nickolay:

      Thanks for your suggestion. I have some questions:

      1. There will be engineering words not in cmudict dictionary which will be added to it. Is it enough if we just train the language model? What about training acoustic model for these new words?

      2. The distribution from WSJ/Hub4 acoustic model jar files contains a .dmp file for language model. To merge this .dmp with our custom language lm file (either in ascii or binary), we need to have the ascii version of wsj/hub4 lm file. Where can we get those?

      3. There is a tool in cmuclmtk by name binlm2arpa. Is this for converting .dmp file to ascii text? If so it is not able to read the .dmp file supplied with acoustic model jar. Is there an alternative to convert dmp file to ascii lm?

      4. While building a language model using cmuclmtk tools, I am assuming that we need to provide a transciption in the form of a text file as an input. Where can we get this transcription file for say, cmudict words? Is it necessary to give well formed sentences in the transcription file or is it ok if we give the word set as an input?

      Looking forward to your reply,

      thanks,
      Kishore

       
      • Nickolay V. Shmyrev

        > What about training acoustic model for these new
        words?

        Acoustic model contains the property of phones, not words. There is no need to retrain the model.

        > the ascii version of wsj/hub4 lm file. Where can we get those?

        I suggest you to use lm_giga language model instead of wsj/hub4. It has ascii version. Though conversion is not a problem as well, sphinx3_lm_convert from sphinx3 can do this for example.

        > If so it is not able to read the .dmp file supplied with acoustic model jar.

        binlm2arpa should also work. What particular problems do you have with this application?

        > Is it necessary to give well formed sentences in the
        transcription file or is it ok if we give the word set as an input?

        I'm not sure what transcription are you talking about here. The input source for any language model is just a text on the topic you are going to recognize. The cars in your case. It can be newspaper articles collection and so on. The text needs to be preprocessed of course, you need to remove punctuation, special characters and so on.
        Your question about word list means you don't understand what language model is. I suggest you to google/read text book about it.

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.