I am trying to test the transcribing using sphinx4 for an audio file where the speaker is giving a speech in english.
I began by adopting the default Transciber demo which comes with sphinx4 download.
Transcriber demo works for audio files where the speaker is speaking digits only.
I have noticed a digits.gram file which this example uses.
My first step was to change from TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz acoustic model to WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz acoustic model in the config file.
I than looked for a equivalent to digits.gram file containing words in english but no luck.
Does anyone no where i can find a generic english.gram file i persume written in Java Speach grammar format? or any other suggestions?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The grammar should depend on the type of text you are trying to recognize. For specific topic you need to create a trigram language model with cmuclmtk from similar texts.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
still having trouble understanding though.
I'v just been doing some reading on the net, and am wondering do i really need a grammar.
Logicaly all i need is a dictionary that can translate Sound to Text. Grammar would be more appropriate on the applications where based on the sound input a particular event is fired. I ma not firing any events. I just simply want a conversion of sound to text.
i downloaded lm_giga_5k_nvp.hvite.dic file from the link above. but don't know where to place this file. The sphinx4-1.0beta2-src\sphinx4-1.0beta2\src\sphinx4\edu\cmu\sphinx\linguist\dictionary folder conatins only java source files. doesn't seem like the right place to place this dictionary file.
any ideas?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Logicaly all i need is a dictionary that can translate Sound to Text.
No, it's not correct. To use ASR you need three basic things - acoustic model, dictionary and a grammar. As an acoustic model you can use wsj. For large vocabulary recognition you need large trigram model like lm_giga, but it's much better to use domain-oriented grammars instead as I told you in the first post
As an example of using grammar for recognition you need to look in tests/performance/wsj5k.config.xml. You need to configure lexTreeLinguist and and a trigram model of type LargeTrigramModel. Also you need to convert lm_giga's arpa text model to compressed dmp format with lm3g2dmp.
You can create your own trigram model from sample texts with cmuclmtk for example.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying to test the transcribing using sphinx4 for an audio file where the speaker is giving a speech in english.
I began by adopting the default Transciber demo which comes with sphinx4 download.
Transcriber demo works for audio files where the speaker is speaking digits only.
I have noticed a digits.gram file which this example uses.
My first step was to change from TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz acoustic model to WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz acoustic model in the config file.
I than looked for a equivalent to digits.gram file containing words in english but no luck.
Does anyone no where i can find a generic english.gram file i persume written in Java Speach grammar format? or any other suggestions?
The grammar should depend on the type of text you are trying to recognize. For specific topic you need to create a trigram language model with cmuclmtk from similar texts.
isn't there a default trigram language model that i can use? or an example on haw to create one?
http://www.inference.phy.cam.ac.uk/kv227/lm_giga/
thanks Nickolay.
still having trouble understanding though.
I'v just been doing some reading on the net, and am wondering do i really need a grammar.
Logicaly all i need is a dictionary that can translate Sound to Text. Grammar would be more appropriate on the applications where based on the sound input a particular event is fired. I ma not firing any events. I just simply want a conversion of sound to text.
i downloaded lm_giga_5k_nvp.hvite.dic file from the link above. but don't know where to place this file. The sphinx4-1.0beta2-src\sphinx4-1.0beta2\src\sphinx4\edu\cmu\sphinx\linguist\dictionary folder conatins only java source files. doesn't seem like the right place to place this dictionary file.
any ideas?
> Logicaly all i need is a dictionary that can translate Sound to Text.
No, it's not correct. To use ASR you need three basic things - acoustic model, dictionary and a grammar. As an acoustic model you can use wsj. For large vocabulary recognition you need large trigram model like lm_giga, but it's much better to use domain-oriented grammars instead as I told you in the first post
As an example of using grammar for recognition you need to look in tests/performance/wsj5k.config.xml. You need to configure lexTreeLinguist and and a trigram model of type LargeTrigramModel. Also you need to convert lm_giga's arpa text model to compressed dmp format with lm3g2dmp.
You can create your own trigram model from sample texts with cmuclmtk for example.