CMU Sphinx / Forums / Sphinx4 Help: voice recognition over mic works well, but from wav file its poor

Diego Fernando Murillo Valencia - 2014-12-22

I use sphinx and configure it for use voxforge spanish, i train it for uses a language model and creates a dictionary of products that in the spanish dictionary are not present, when i test it using my mic and i see that is almost 80% accurate for me, so then i need to recognize using a wav file that i create using java Audio System, i read in the documentation how the file must be create, so i create it with 16000 hz, mono, little-endian and 256 bitrate, the same configuration for the example file in the transcriber example, but using the same coniguration that i use for the mic and adding the necessary config for read from a file the recognition is very poor or inexistent, so what could be? i need to create the file for sending it over a network and return the recognition, so how i could read better from the file or how could i send voice data from an station to a server and do the recognition?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-12-22

but using the same coniguration that i use for the mic and adding the necessary config for read from a file the recognition is very poor or inexistent, so what could be?

You are using outdated sphinx4. You can checkout latest s4 from http://github.com/cmusphinx/sphinx4. It does not require any config or config updates.

i need to create the file for sending it over a network and return the recognition, so how i could read better from the file or how could i send voice data from an station to a server and do the recognition?

Sorry, could you elaborate? It's hard to understand what is going on there.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Diego Fernando Murillo Valencia - 2014-12-22

Hello, the link that you posted is not working, i get 404 in github.
About the last part about the file, what i'm trying to do is:

I have a server configured with Sphinx 4, then i have several stations that communicate with the server over a lan network, each station will have a mic, in each station i have an java app that reads from the mic and create a WAVE file for send it to the server using Java RMI (i pass an array of bytes ) then in the server i pass the file to the recognizer (the file is created in a temp folder for read), finally i return the text recognized to the station, i want to send the voice data from the station to the server (Sphinx 4) no matter how, and the recognition is as the mic is plugged in the server, i.e. a good accuracy, its better my explanation? Thanks for replying.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexander Solovets - 2014-12-22

Hi Diego, the link was badly parsed, you need to remove dot from the end.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Diego Fernando Murillo Valencia - 2014-12-24

Hello, thanks to everyone now is working fine, the only problem now its that the recognition of certain words is a little poor, how i can increase the accurate of these words? i'm creating my language model for only use a group of words, in this case only products like milks, fruits, etc.

For example Manzana (apple), Naranja (Orange) and others are perfectly recognized, but others like Paquete (Package) Papa (Potato) are poor recognized, so why it's happening? and how i can increase the accuracy? i use g2p for create my dictionary of these words.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-24
  
  Its hard to say what is the reason of the failure, there could be many reasons. To debug decoding issues you need to provide audio recordings of the words you are trying to recognize and describe how do you use recognizer exactly.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Diego Fernando Murillo Valencia - 2014-12-26

fine, i'm creating a custom dictionary and a language model in this way:

i have a TXT file called PRODUCTOS.txt where i have all my products names and words that i want to recognize, like this:

AGUACATE AJO AMARILLA APIO ARBOL ARCHUCHA ARRACACHA ARRAYANA ARVEJA ATADO BABY BADEA BANANO BATAVIA BERENGENA BLANCA BLANCO BOLSA BOROJO BROCOLI BULTO CABEZONA CALABACIN CALENDULA ...

Then i created my dictionary file using g2p.py (i created at least to 'model-5' like the README of this command suggest, i use the spanish.dic from the voxforge/etc folder and i renamed it to train.lex) i used it like:

g2p.py --model model-5 --apply PRODUCTOS.txt | sed -r 's/stack usage.*//g' > products.dic

This generates to me a products.dic with only the words contained in PRODUCTOS.txt

Later i created a Language model using a PRODUCTS.txt like above but each word is limited with ~~and~~ and use these commands:

text2wfreq < PRODUCTOS.txt | wfreq2vocab > productos.tmp.vocab text2idngram -vocab productos.tmp.vocab < PRODUCTOS.txt > productos.idngram idngram2lm -vocab_type 1 -idngram productos.idngram -vocab productos.tmp.vocab -arpa productos.arpa sphinx_lm_convert -i productos.arpa -o productos.lm.dmp

Finally i load my dictionary, language model and point the acoustic model to voxforge_es/model folder, like this:

Configuration configuration = new Configuration(); // Load model from the jar configuration .setAcousticModelPath("resource:/voicerecognition/voxforge_es/model"); // You can also load model from folder // configuration.setAcousticModelPath("file:en-us"); configuration .setDictionaryPath("resource:/voicerecognition/training/dictionary/products.dic"); configuration .setLanguageModelPath("resource:/voicerecognition/training/languagemodel/productos.lm.dmp"); try { recognizer = new StreamSpeechRecognizer( configuration); } catch (IOException e) { e.printStackTrace(); }
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-27
  
  You need to provide audio recordings of the words you are trying to recognize and describe how do you use recognizer exactly. You need to provide data files you are using.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Diego Fernando Murillo Valencia - 2015-01-05
    
    hello, sorry for late reply, i tried to train and create my custom acoustic model using my voice and using sphinx Training, this improves the accuracy of the recognition but still having problem with some words (at least 6 word which is a great improvement), i don't understand what you are saying to me, what you need is the file that i'm sending to the recognizer with the words what are not recognizing and also the files that i'm using like the dictionary, language model and acoustic model? and my code? if yes i can attach those in this forum?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Diego Fernando Murillo Valencia - 2015-01-05
      
      another question is that the mic that i'm using its a genius commonly used in the cyber cafes, so i'm suspecting that the quality it's a little low, this affects the recognition? i'm using Audacity for record my voice in the adaptation of the acoustic model
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2015-01-05

if yes i can attach those in this forum?

You can share on dropbox/google drive and give here a link

that the quality it's a little low, this affects the recognition?

Yes

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Diego Fernando Murillo Valencia - 2015-01-06
  
  here is the files, inside there is a README file with details, i appreciate all your help :D
  
  https://www.dropbox.com/s/nvn44s7tc5mzir0/VOICERECOGNITION_FILES.zip?dl=0
  
  another problem that i'm detecting its that a person has to spoke a little loudly for the recognition works well, how i can configure the level of sound, the mic in my ubuntu has the 100% of volume, thanks.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2015-01-08
    
    Hello Diego
    
    I reviewed your setup. You need to make the following changes:
    
    1) Use SRILM, quick_lm.pl script or any other good language modeling toolkit (not cmuclmtk) to create trigram language model from your list. Your lm is not properly created.
    
    2) Since es model is continuous you need to use MLLR adaptation, not map adaptation. That will give you more accuracy and robustness.
    
    Once you create proper language model the recognition accuracy would be good.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2015-01-08
      
      I'm attaching you the proper LM to use with your model.
      
      For the list of words it's also recommended to use JSGF grammar.
      
      PRODUCTOS.lm
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

voice recognition over mic works well, but from wav file its poor

Speech Recognition Toolkit

Forums

Help

voice recognition over mic works well, but from wav file its poor document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

voice recognition over mic works well, but from wav file its poor