CMU Sphinx / Forums / Help: Using ITSM spanish model with sphinx4

Héctor Delgado Flores - 2008-05-09

Hello,

I'm trying to use sphinx4 with the ITESM h4 model for spanish. I'm modifying the "wavfile" demo to recognize 3 keywords in wav files.

The models are here: http://www.speech.cs.cmu.edu/sphinx/models/hub4spanish_itesm/

But these models are in sphinxTrain format. For sphinx4 I have to make a .jar file with the model. I did this following this link: http://cmusphinx.sourceforge.net/sphinx4/doc/UsingSphinxTrainModels.html

I have changed some parammeters of the config.xml,but I don't know if I'm doing it well.

When I run the program, it keep running much time and no result is returned

What am I doing wrong?

These are my files: http://www.megaupload.com/?d=BP3M2CLG

Thanks a lot

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Claudia Ocampo - 2008-11-01
  
  Hola, estoy trabajando con Sphinx-4, y necesito configurarlo para español, cuando ejecuto mi programa me sale este error:
  
  Loading Recognizer...
  
  Exception in thread "main" java.lang.NullPointerException
  at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:64)
  at edu.cmu.sphinx.util.props.ConfigurationManager.loader(ConfigurationManager.java:383)
  at edu.cmu.sphinx.util.props.ConfigurationManager.<init>(ConfigurationManager.java:115)
  at demo.sphinx.wavfile.WavFile.main(WavFile.java:60)
  
  Alguien me podrian colaborar.
  
  Muchas Gracias.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2008-05-09
  
  I was unable to download the files. Could you use another resourse, say mediafire.com instead.
  
  The biggest problem is that spanish models use s3_1x39 feature set so you have to use another feature extraction class in the frontend (S3FeatureExtractor). The rest must be quite standard.
  
  About your question on 256M, well, it's quite standard. Remember that there is always a swap file and you can even pass -Xmx512m, it doesn't mean java will actually use 512m. After all it's Java.
  
  About your task, I'm not quite sure why do you want to setup sphinx4, I don't think it will bring you something new.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Santiago Brandi - 2008-06-27
    
    The biggest problem is that spanish models use s3_1x39 feature set so you have to use another feature extraction class in the frontend (S3FeatureExtractor). The rest must be quite standard
    
    Hi, im sorry about keeping asking for help...
    
    I have my application running with the itsm spanish models but the recongnition is totaly null, in first place i couldnt find info about how to use the H4.arpa.Z.DMP file, and also had no idea about the s3_1x39 feature...
    
    How do i get or create that diferent feature extraction class in the frontend??
    
    sorry for my ignorance..
    
    thanks for your help!
    santiago
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2008-06-27
      
      > I have my application running with the itsm spanish models but the recongnition is totaly null, in first place i couldnt find info about how to use the H4.arpa.Z.DMP file, and also had no idea about the s3_1x39 feature...
      
      There must be different problem. First of all, don't use H4.arpa.Z.DMP, just because it's not suitable for your task most probable. Second, to use s3_1x39, choose S3FeatureExtractor in frontend. If you'll still have troubles, please give a link to your file and it's transcription.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Héctor Delgado Flores - 2008-05-09
  
  Thank you for your answer!
  
  I expect results will be similar. The reason is that I think it's easier for me to write an application with a simple user interface than with C language. I have no much time for my project and I shuld have something even if results aren't perfect.
  
  My files: http://www.mediafire.com/?hkxkatyjbty
  
  Thank you again.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Héctor Delgado Flores - 2008-05-10
  
  Nickolay,
  
  I'm trying and I don't get nothing. Can you provide a config.xml file that works fot my test?
  
  Thank you very much.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2008-05-10
    
    Well, I managed to make it work. On the way I had to fix a bug in sphinx4. Check my files here:
    
    http://www.mediafire.com/?nizfvxxesg9
    
    You have to checkout latest sphinx4 svn and apply the patch attached. Its still very slow and not so optimal in keyword spotting, as I said we ought to try another search algorithm.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Héctor Delgado Flores - 2008-05-12
  
  Sorry
  
  Which svn subcommand may I use to apply the patch? The patch file is sphinx4_noloop.diff?
  
  Thanks
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2008-05-12
    
    cp sphinx4_noloop.diff sphinx4
    cd sphinx4
    patch -p0 < sphinx4_noloop.diff
    
    alternatively you can just open the patch with text editor and make changes from it by hand. man patch can be also helpful.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Santiago Brandi - 2008-06-12
  
  Hi, i downloaded the acustic models from the same link, and also trainded the models following the steps in the other link you mentioned, when i try to run he application an errors pops, something about a bad URL in the config.xml file, in dictionary configuration, it seems it doesnt recognises the JAR created, or something like that, i really dont know.
  If someone has an idea of what may be happening or managed to make sphinx 4 run with spanish words i would really apreciate a hand.
  
  thanks a lot!
  excuse me for my english...
  
  Santiago
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2008-06-13
    
    > when i try to run he application an errors pops, something about a bad URL in the config.xml file, in dictionary configuration, it seems it doesnt recognises the JAR created, or something like that, i really dont know.
    
    Learn to paste the errors when you report about them first. It's a trivial thing you must understand first. We'll translate it for you if can't do it yourself.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Santiago Brandi - 2008-06-17
      
      Hi, this is what i get when i try to run the aplication :
      
      Problem configuring HelloDigits: Property Exception component:'dictionary' property:'dictionaryPath' - Bad URL resource:/edu.cmu.sphinx.model.acoustic.ESPAÑOL_H4.Model!/edu/cmu/sphinx/model/acoustic/ESPAÑOL_H4/dict/cmudict.0.6dunknown protocol: resource
      Property Exception component:'dictionary' property:'dictionaryPath' - Bad URL resource:/edu.cmu.sphinx.model.acoustic.ESPAÑOL_H4.Model!/edu/cmu/sphinx/model/acoustic/ESPAÑOL_H4/dict/cmudict.0.6dunknown protocol: resource
      
      This lines belong to the config.xml file, when instead of using this spanish acustic model i use the wsj model it runs pefectly...
      
      I trained the model following the steps from the link http://cmusphinx.sourceforge.net/sphinx4/doc/UsingSphinxTrainModels.html
      
      any ideas ?
      
      thanks
      Santiago
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Santiago Brandi - 2008-06-17
        
        Well, i repeated the whole process again and now it works, i was doing something wrong obviously...
        
        Now the problem i have is that the eficiency in recongnition is really poor, y read something about some parameters needed to be changed...
        
        if someone worked that out i would apreciate a hint!
        
        thank a lot!
        santiago
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Santiago Brandi - 2008-06-28
  
  hi! it works thanks a lot! now the application recognizes spanish words with remarkably accuracy!
  
  Now i´ve encoutered a new kind of problem, using this s3FeatureExtractor, the aplication recognizes only one word "per time", for example if i say "abrir puerta", it only returns "abrir", or if i say the same word two times, it only returns it once...
  
  I´ve been checking out the codes of deltaFeatureExtractor and s3FeatureExtractor, guessing the problem was in the time window size but i exetended it as much as i could and the results are the same. More over, when i try to impose some grammar rules, like the ones you can see in helloWord demo, in which words must follow some determined order, the program keeps loading and loading and doesn´t starts....
  
  Do you something about this???
  
  thanks again!
  Santiago
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2008-06-29
    
    It's the restriction of your grammar or language model. It's not related to features at all.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Using ITSM spanish model with sphinx4

Speech Recognition Toolkit

Forums

Help

Using ITSM spanish model with sphinx4 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Using ITSM spanish model with sphinx4