CMU Sphinx / Forums / Help: Need some help with new dictionary

kyriakos - 2006-04-25

Hey to all,
I'm a newbie in the sphinx world and I would like some help.
I'm trying to develop an application that can recognize 30 words max. The first thing that I did was to produce my own dictionary and language model. I used the imtool which can be found here:
http://www.speech.cs.cmu.edu/tools/lmtool
I use the WSJ acoustic model and the sphinx 4 decoder and I make my tests modifying the HelloNgram demo that comes along with the sphinx4.

When I replaced the file cmudict.0.6d with my own dictionary and the file hellongram.trigram.lm with my own language model and rebuild everything I managed to recognize some of my words but with low success, meaning that I have to say the word 10 times to recognize it successfully. And some words are not recognized at all.

I also tried to use the SphinxTrain to make my own mdef files and replace the ones of the WSJ, with no success.

I think the problem is with the pronunciation and the dictionary and I wonder if there is a way to have a good change of recognizing these 30 words I want.

I have read some other posts with an approximate subject and the suggestion was to use the whole cmudict.0.6d and use a different language model. In this case though I use a dictionary of 125K words to recognize just 30 and the memory requirements are maximised without a reason.

The next thing I will try is to make my dictionary and language model with the The CMU Statistical Language Modeling (SLM) Toolkit and maybe he results are better.
I would like, if someone had the same problem and found a solution or if there is a suggestion of a different methodology to follow, to contact me.
Thanks in advance

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Robbie - 2006-04-25
  
  If possible, could you explain a little more about the purpose of your application? Without more information, I would assume that your best bet is a grammar (your best bet: JSGF) rather than a LM. I would also use something like PERL to extract the 30 words from the CMUDict to reduce the memory requirement (although it really isn't that big of a deal).
  
  Regards,
  Robbie
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - kyriakos - 2006-04-26
    
    First of all, thanks for the quick reply.
    I want to make an application that can recognize a number of words (I believe that it will be around 30 common words). It will recognize the word spoken to a microphone and then according to the recognition will execute the appropriate commands. These commands will de probably apps with the use of java sockets or java agents. I don’t want whole sentences, just words. It will be like a menu that can be controlled by voice. The thing that I want from the sphinx is to recognize the words that I speak to the microphone with a good chance of success.
    I have tried many things, but none make the result that I want. In this moment I’m trying to build new models, dictionaries and all the needed files and to train them, but I am confused since the documentation in SphinxTrain is to complex.
    About your suggestion with JSGF, I haven’t made anything with the grammar and I don’t know where to start. I would appreciate if you had something to suggest.
    I will try the perl scripts today from the SphinxTrain and I hope it will be some better results.
    
    Regards,
    Jack
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Robbie - 2006-04-26
      
      You will probably be wasting your time using SphinxTrain. Remember, there are two models that a speech recognizer uses: an acoustic model and a language model (or, alternatively, a grammar). The acoustic model provides probabilities of different phoenemes and the language model provides a way for the recognizer to combine these phonemes into the most probably word sequences.
      
      That said, SphinxTrain is for the acoustic model. The default acoustic models (for instance, download the HUB4 acoustic model (we won't need the language model) from sourceforge) should be sufficient for your needs.
      
      I recommend that you either use a SimpleWordListGrammar or a JSGFGrammar, the latter being easier for beginners because you can just cut and paste from the demos. Start with demos/jsapi/jsgf and demos/jsapi/dialog as they seems to be very similar to what you are trying to do. The following demos also all use JSGF grammars: HelloWorld, HelloDigits, Transcriber, WavFile, ZipCity.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - kyriakos - 2006-04-27
        
        I have spend many hours with the SphinxTrain and I realized that I have no better results.
        I will try the JSGFGrammar with the WSJ acoustic model, the way it is used in HelloWorld demo.
        Thanks for the 'support'.
        If I have a problem I may bother you a little more.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- kyriakos - 2006-04-28
  
  I followed your advice with the JSGFGrammar and I had the results I wanted.
  Now I have a new problem. Everything looks good with the words that wsj accoustic model support. I want to add some new words that are not in the cmudict.0.6d. These words are not recognized at all.
  I think that I have to train a new acoustic model, so that I can recognize all my words.
  Is there an alternate way to recognize untrained words and just modify some files (like dictionary), because the whole process of training is to complex and I have tried it with no success.
  I had many problems with the raw files that should convert them to mfcs, and match the transcript file and so on.
  
  Thanks in advance
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Robbie - 2006-04-28
    
    A lot of people think that the dictionary and the acoustic model are inherently related, which is a slight misunderstanding probably brought about because the acoustic models package a dictionary within the same .jar.
    
    To be sure, they are related in one important aspect: the entries in the dictionary must use only the phones available from the acoustic model. In most cases, this affects only the filler dictionary and not necessarily the word dictionary. For instance, I believe the WSJ and the HUB4 dictionaries are totally compatible (but not the filler dictionaries).
    
    Anyway, to answer your question more directly, you're welcome to use any dictionary you want, including building your own, just so long as only use phones present in the acoustic model you use. In other words, there is absolutely no need to train a new acoustic model to add new words. The acoustic model just gives a statistical representation of each phoneme of a language. The dictionary tells sphinx how to combine these phonemes into words.
    
    What I would do is copy out the dictionary to an external location, take a look at it so that you get a feel for the format (it is very simple), then check out http://www.speech.cs.cmu.edu/cgi-bin/cmudict for a list of phonemes (you can also type words in and it will provide you with a pronunciation that you can add to your dictionary). Just remember that if you are adding more than one pronunciation per word, the first word is normal and the second word requires (2) and so on, like so:
    the dh uh
    the(2) d a
    
    and so on.
    
    Be sure to change your config.xml file under the dictionary component there is a "location" (or something like that) property you will need to set to point to your new dictionary (right now it reads from the acoustic model .jar).
    
    Regards,
    Robbie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- kyriakos - 2006-05-02
  
  Thanks for the reply,
  it realy helped me a lot.
  I will try to make a suitable dictionary with the existing acoustic model.
  The problem with the words tha my application did not recognize, was that the pronunciation wasn't compatible with the phonemes of the WSJ acoustic model.
  I will fix this and I hope that I will have the excpected outcome.
  Thanks once more for the help.
  
  Regards,
  Jack
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Need some help with new dictionary

Speech Recognition Toolkit

Forums

Help

Need some help with new dictionary document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Need some help with new dictionary