I use sphinx and configure it for use voxforge spanish, i train it for uses a language model and creates a dictionary of products that in the spanish dictionary are not present, when i test it using my mic and i see that is almost 80% accurate for me, so then i need to recognize using a wav file that i create using java Audio System, i read in the documentation how the file must be create, so i create it with 16000 hz, mono, little-endian and 256 bitrate, the same configuration for the example file in the transcriber example, but using the same coniguration that i use for the mic and adding the necessary config for read from a file the recognition is very poor or inexistent, so what could be? i need to create the file for sending it over a network and return the recognition, so how i could read better from the file or how could i send voice data from an station to a server and do the recognition?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
but using the same coniguration that i use for the mic and adding the necessary config for read from a file the recognition is very poor or inexistent, so what could be?
You are using outdated sphinx4. You can checkout latest s4 from http://github.com/cmusphinx/sphinx4. It does not require any config or config updates.
i need to create the file for sending it over a network and return the recognition, so how i could read better from the file or how could i send voice data from an station to a server and do the recognition?
Sorry, could you elaborate? It's hard to understand what is going on there.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello, the link that you posted is not working, i get 404 in github.
About the last part about the file, what i'm trying to do is:
I have a server configured with Sphinx 4, then i have several stations that communicate with the server over a lan network, each station will have a mic, in each station i have an java app that reads from the mic and create a WAVE file for send it to the server using Java RMI (i pass an array of bytes ) then in the server i pass the file to the recognizer (the file is created in a temp folder for read), finally i return the text recognized to the station, i want to send the voice data from the station to the server (Sphinx 4) no matter how, and the recognition is as the mic is plugged in the server, i.e. a good accuracy, its better my explanation? Thanks for replying.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello, thanks to everyone now is working fine, the only problem now its that the recognition of certain words is a little poor, how i can increase the accurate of these words? i'm creating my language model for only use a group of words, in this case only products like milks, fruits, etc.
For example Manzana (apple), Naranja (Orange) and others are perfectly recognized, but others like Paquete (Package) Papa (Potato) are poor recognized, so why it's happening? and how i can increase the accuracy? i use g2p for create my dictionary of these words.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Its hard to say what is the reason of the failure, there could be many reasons. To debug decoding issues you need to provide audio recordings of the words you are trying to recognize and describe how do you use recognizer exactly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Then i created my dictionary file using g2p.py (i created at least to 'model-5' like the README of this command suggest, i use the spanish.dic from the voxforge/etc folder and i renamed it to train.lex) i used it like:
You need to provide audio recordings of the words you are trying to recognize and describe how do you use recognizer exactly. You need to provide data files you are using.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hello, sorry for late reply, i tried to train and create my custom acoustic model using my voice and using sphinx Training, this improves the accuracy of the recognition but still having problem with some words (at least 6 word which is a great improvement), i don't understand what you are saying to me, what you need is the file that i'm sending to the recognizer with the words what are not recognizing and also the files that i'm using like the dictionary, language model and acoustic model? and my code? if yes i can attach those in this forum?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
another question is that the mic that i'm using its a genius commonly used in the cyber cafes, so i'm suspecting that the quality it's a little low, this affects the recognition? i'm using Audacity for record my voice in the adaptation of the acoustic model
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
another problem that i'm detecting its that a person has to spoke a little loudly for the recognition works well, how i can configure the level of sound, the mic in my ubuntu has the 100% of volume, thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I reviewed your setup. You need to make the following changes:
1) Use SRILM, quick_lm.pl script or any other good language modeling toolkit (not cmuclmtk) to create trigram language model from your list. Your lm is not properly created.
2) Since es model is continuous you need to use MLLR adaptation, not map adaptation. That will give you more accuracy and robustness.
Once you create proper language model the recognition accuracy would be good.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I use sphinx and configure it for use voxforge spanish, i train it for uses a language model and creates a dictionary of products that in the spanish dictionary are not present, when i test it using my mic and i see that is almost 80% accurate for me, so then i need to recognize using a wav file that i create using java Audio System, i read in the documentation how the file must be create, so i create it with 16000 hz, mono, little-endian and 256 bitrate, the same configuration for the example file in the transcriber example, but using the same coniguration that i use for the mic and adding the necessary config for read from a file the recognition is very poor or inexistent, so what could be? i need to create the file for sending it over a network and return the recognition, so how i could read better from the file or how could i send voice data from an station to a server and do the recognition?
You are using outdated sphinx4. You can checkout latest s4 from http://github.com/cmusphinx/sphinx4. It does not require any config or config updates.
Sorry, could you elaborate? It's hard to understand what is going on there.
Hello, the link that you posted is not working, i get 404 in github.
About the last part about the file, what i'm trying to do is:
I have a server configured with Sphinx 4, then i have several stations that communicate with the server over a lan network, each station will have a mic, in each station i have an java app that reads from the mic and create a WAVE file for send it to the server using Java RMI (i pass an array of bytes ) then in the server i pass the file to the recognizer (the file is created in a temp folder for read), finally i return the text recognized to the station, i want to send the voice data from the station to the server (Sphinx 4) no matter how, and the recognition is as the mic is plugged in the server, i.e. a good accuracy, its better my explanation? Thanks for replying.
Hi Diego, the link was badly parsed, you need to remove dot from the end.
Hello, thanks to everyone now is working fine, the only problem now its that the recognition of certain words is a little poor, how i can increase the accurate of these words? i'm creating my language model for only use a group of words, in this case only products like milks, fruits, etc.
For example Manzana (apple), Naranja (Orange) and others are perfectly recognized, but others like Paquete (Package) Papa (Potato) are poor recognized, so why it's happening? and how i can increase the accuracy? i use g2p for create my dictionary of these words.
Its hard to say what is the reason of the failure, there could be many reasons. To debug decoding issues you need to provide audio recordings of the words you are trying to recognize and describe how do you use recognizer exactly.
fine, i'm creating a custom dictionary and a language model in this way:
i have a TXT file called PRODUCTOS.txt where i have all my products names and words that i want to recognize, like this:
Then i created my dictionary file using g2p.py (i created at least to 'model-5' like the README of this command suggest, i use the spanish.dic from the voxforge/etc folder and i renamed it to train.lex) i used it like:
This generates to me a products.dic with only the words contained in PRODUCTOS.txt
Later i created a Language model using a PRODUCTS.txt like above but each word is limited with
andand use these commands:Finally i load my dictionary, language model and point the acoustic model to voxforge_es/model folder, like this:
You need to provide audio recordings of the words you are trying to recognize and describe how do you use recognizer exactly. You need to provide data files you are using.
hello, sorry for late reply, i tried to train and create my custom acoustic model using my voice and using sphinx Training, this improves the accuracy of the recognition but still having problem with some words (at least 6 word which is a great improvement), i don't understand what you are saying to me, what you need is the file that i'm sending to the recognizer with the words what are not recognizing and also the files that i'm using like the dictionary, language model and acoustic model? and my code? if yes i can attach those in this forum?
another question is that the mic that i'm using its a genius commonly used in the cyber cafes, so i'm suspecting that the quality it's a little low, this affects the recognition? i'm using Audacity for record my voice in the adaptation of the acoustic model
You can share on dropbox/google drive and give here a link
Yes
here is the files, inside there is a README file with details, i appreciate all your help :D
https://www.dropbox.com/s/nvn44s7tc5mzir0/VOICERECOGNITION_FILES.zip?dl=0
another problem that i'm detecting its that a person has to spoke a little loudly for the recognition works well, how i can configure the level of sound, the mic in my ubuntu has the 100% of volume, thanks.
Hello Diego
I reviewed your setup. You need to make the following changes:
1) Use SRILM, quick_lm.pl script or any other good language modeling toolkit (not cmuclmtk) to create trigram language model from your list. Your lm is not properly created.
2) Since es model is continuous you need to use MLLR adaptation, not map adaptation. That will give you more accuracy and robustness.
Once you create proper language model the recognition accuracy would be good.
I'm attaching you the proper LM to use with your model.
For the list of words it's also recommended to use JSGF grammar.