Menu

voice recognition over mic works well, but from wav file its poor

2014-12-22
2015-01-08
  • Diego Fernando Murillo Valencia

    I use sphinx and configure it for use voxforge spanish, i train it for uses a language model and creates a dictionary of products that in the spanish dictionary are not present, when i test it using my mic and i see that is almost 80% accurate for me, so then i need to recognize using a wav file that i create using java Audio System, i read in the documentation how the file must be create, so i create it with 16000 hz, mono, little-endian and 256 bitrate, the same configuration for the example file in the transcriber example, but using the same coniguration that i use for the mic and adding the necessary config for read from a file the recognition is very poor or inexistent, so what could be? i need to create the file for sending it over a network and return the recognition, so how i could read better from the file or how could i send voice data from an station to a server and do the recognition?

     
  • Nickolay V. Shmyrev

    but using the same coniguration that i use for the mic and adding the necessary config for read from a file the recognition is very poor or inexistent, so what could be?

    You are using outdated sphinx4. You can checkout latest s4 from http://github.com/cmusphinx/sphinx4. It does not require any config or config updates.

    i need to create the file for sending it over a network and return the recognition, so how i could read better from the file or how could i send voice data from an station to a server and do the recognition?

    Sorry, could you elaborate? It's hard to understand what is going on there.

     
  • Diego Fernando Murillo Valencia

    Hello, the link that you posted is not working, i get 404 in github.
    About the last part about the file, what i'm trying to do is:

    I have a server configured with Sphinx 4, then i have several stations that communicate with the server over a lan network, each station will have a mic, in each station i have an java app that reads from the mic and create a WAVE file for send it to the server using Java RMI (i pass an array of bytes ) then in the server i pass the file to the recognizer (the file is created in a temp folder for read), finally i return the text recognized to the station, i want to send the voice data from the station to the server (Sphinx 4) no matter how, and the recognition is as the mic is plugged in the server, i.e. a good accuracy, its better my explanation? Thanks for replying.

     
  • Alexander Solovets

    Hi Diego, the link was badly parsed, you need to remove dot from the end.

     
  • Diego Fernando Murillo Valencia

    Hello, thanks to everyone now is working fine, the only problem now its that the recognition of certain words is a little poor, how i can increase the accurate of these words? i'm creating my language model for only use a group of words, in this case only products like milks, fruits, etc.

    For example Manzana (apple), Naranja (Orange) and others are perfectly recognized, but others like Paquete (Package) Papa (Potato) are poor recognized, so why it's happening? and how i can increase the accuracy? i use g2p for create my dictionary of these words.

     
    • Nickolay V. Shmyrev

      Its hard to say what is the reason of the failure, there could be many reasons. To debug decoding issues you need to provide audio recordings of the words you are trying to recognize and describe how do you use recognizer exactly.

       
  • Diego Fernando Murillo Valencia

    fine, i'm creating a custom dictionary and a language model in this way:

    i have a TXT file called PRODUCTOS.txt where i have all my products names and words that i want to recognize, like this:

    AGUACATE
    AJO
    AMARILLA
    APIO
    ARBOL
    ARCHUCHA
    ARRACACHA
    ARRAYANA
    ARVEJA
    ATADO
    BABY
    BADEA
    BANANO
    BATAVIA
    BERENGENA
    BLANCA
    BLANCO
    BOLSA
    BOROJO
    BROCOLI
    BULTO
    CABEZONA
    CALABACIN
    CALENDULA
    ...
    

    Then i created my dictionary file using g2p.py (i created at least to 'model-5' like the README of this command suggest, i use the spanish.dic from the voxforge/etc folder and i renamed it to train.lex) i used it like:

    g2p.py --model model-5 --apply PRODUCTOS.txt | sed -r 's/stack usage.*//g' > products.dic
    

    This generates to me a products.dic with only the words contained in PRODUCTOS.txt

    Later i created a Language model using a PRODUCTS.txt like above but each word is limited with and and use these commands:

    text2wfreq < PRODUCTOS.txt | wfreq2vocab > productos.tmp.vocab
    
    text2idngram -vocab productos.tmp.vocab < PRODUCTOS.txt > productos.idngram
    
    idngram2lm -vocab_type 1 -idngram productos.idngram -vocab productos.tmp.vocab -arpa productos.arpa
    
    sphinx_lm_convert -i productos.arpa -o productos.lm.dmp
    

    Finally i load my dictionary, language model and point the acoustic model to voxforge_es/model folder, like this:

        Configuration configuration = new Configuration();
    
        // Load model from the jar
        configuration
                .setAcousticModelPath("resource:/voicerecognition/voxforge_es/model");
    
        // You can also load model from folder
        // configuration.setAcousticModelPath("file:en-us");
    
        configuration
                .setDictionaryPath("resource:/voicerecognition/training/dictionary/products.dic");
        configuration
                .setLanguageModelPath("resource:/voicerecognition/training/languagemodel/productos.lm.dmp");
    
    
        try {
            recognizer = new StreamSpeechRecognizer(
                    configuration);
        } catch (IOException e) {
            e.printStackTrace();
        }
    
     
    • Nickolay V. Shmyrev

      You need to provide audio recordings of the words you are trying to recognize and describe how do you use recognizer exactly. You need to provide data files you are using.

       
      • Diego Fernando Murillo Valencia

        hello, sorry for late reply, i tried to train and create my custom acoustic model using my voice and using sphinx Training, this improves the accuracy of the recognition but still having problem with some words (at least 6 word which is a great improvement), i don't understand what you are saying to me, what you need is the file that i'm sending to the recognizer with the words what are not recognizing and also the files that i'm using like the dictionary, language model and acoustic model? and my code? if yes i can attach those in this forum?

         
        • Diego Fernando Murillo Valencia

          another question is that the mic that i'm using its a genius commonly used in the cyber cafes, so i'm suspecting that the quality it's a little low, this affects the recognition? i'm using Audacity for record my voice in the adaptation of the acoustic model

           
  • Nickolay V. Shmyrev

    if yes i can attach those in this forum?

    You can share on dropbox/google drive and give here a link

    that the quality it's a little low, this affects the recognition?

    Yes

     
    • Diego Fernando Murillo Valencia

      here is the files, inside there is a README file with details, i appreciate all your help :D

      https://www.dropbox.com/s/nvn44s7tc5mzir0/VOICERECOGNITION_FILES.zip?dl=0

      another problem that i'm detecting its that a person has to spoke a little loudly for the recognition works well, how i can configure the level of sound, the mic in my ubuntu has the 100% of volume, thanks.

       
      • Nickolay V. Shmyrev

        Hello Diego

        I reviewed your setup. You need to make the following changes:

        1) Use SRILM, quick_lm.pl script or any other good language modeling toolkit (not cmuclmtk) to create trigram language model from your list. Your lm is not properly created.

        2) Since es model is continuous you need to use MLLR adaptation, not map adaptation. That will give you more accuracy and robustness.

        Once you create proper language model the recognition accuracy would be good.

         
        • Nickolay V. Shmyrev

          I'm attaching you the proper LM to use with your model.

          For the list of words it's also recommended to use JSGF grammar.

           

Log in to post a comment.

MongoDB Logo MongoDB