My goal is to do offline speech (containing english words) 2 text conversion using sphinx4..i.e i have a wav file which can contain english words( i think in sr lingo these r c/d non digit utternaces)and i need 2 convert these into text.
i am exploring the sphinx4 demos...hellodigits,wavfile,transcriber etc but all of them only works with digits data.
i have the follg queries :
1.what will it take for me to make these demos work for non-digit data(i.e english language words)?....
the readme for transcriber demo says that non-digit STT can be accomplished by suitably modifying the config.xml file....i went through the sphinx4 configuration management doc..
but can smbdy help me figure out exactly what components need to be modified and roughly what all changes are required..if i want it to work for non-digits(i.e normal english words) instead of only digits
to use the lattice demo we need to get it by cvs and then build & run it.
i am behind a firewall and have tried accessing the cvs tree using wincvs and the normal cmdline cvs but it hasnt worked for me
The alternative suggested at sphinx documentation(https://sourceforge.net/docman/display_doc.php?docid=14033&group_id=1#firewall)
D:\share_for_spiff\sphinx related\sphinx4-1.0beta\bin>cvs -d :pserver:anonymous@
cvs-pserver.sourceforge.net:80/cvsroot/cmusphinx co sphinx4
Unknown host cvs-pserver.sourceforge.net.
has not worked for me.(have tried port 443 as well)
also i came across the hellongram demo and i tried it out....for me the accuracy of the recognition was absymally low.....
2 Qs here :
i. how can i improve the accuracy for this demo ?
ii. is it possible to add to the list of sentences which can be recognized?
if so, how exactly 2 go abt it ?
some major irritants
a. 'the' in the beginning of a sentence is almost never recognized
b. 'purple' is rarely recognized correctly.it is almost always recognized as 'front'
can sphinx be used for enterprise grade SR as well ?
i am contemplating a scenario where the SR is done on a central server as opposed to doing it on individual machines as individual devices often have limited computation and memory capabilities.
so the idea is to have a v.v. high grade SR done centrally on a server which would do SR for several devices which would submit their individual "SR jobs"( for lack of a better word) to it( "the SR server").
Can u sugest some other options (other than sphinx) which could meet this requirement ?
i am aware of only MSS(microsoft speech server)
would sphinx be a good choice for such a scenario ?
would changes be required to sphinx in its current form to do this i.e. to make it enterprise grade ?
awaiting an early reply
thanks a ton
ashish
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
For question (1). You should use a different acoustic model, for example WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar which you can find it in the directory lib in sphinx 4. You also need a different language model (don't know whether it come with sphinx4) and dictionary (dictionary is inside the jar file). Read the Sphinx4 documentation for information on how to use a new acoustic model. On the parameters to use, you can refer to the regression test of hub4 (sphinx4/tests/performance/hub4/hub4.config.xml).
TP
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hi folks
My goal is to do offline speech (containing english words) 2 text conversion using sphinx4..i.e i have a wav file which can contain english words( i think in sr lingo these r c/d non digit utternaces)and i need 2 convert these into text.
i am exploring the sphinx4 demos...hellodigits,wavfile,transcriber etc but all of them only works with digits data.
i have the follg queries :
1.what will it take for me to make these demos work for non-digit data(i.e english language words)?....
the readme for transcriber demo says that non-digit STT can be accomplished by suitably modifying the config.xml file....i went through the sphinx4 configuration management doc..
but can smbdy help me figure out exactly what components need to be modified and roughly what all changes are required..if i want it to work for non-digits(i.e normal english words) instead of only digits
i am behind a firewall and have tried accessing the cvs tree using wincvs and the normal cmdline cvs but it hasnt worked for me
The alternative suggested at sphinx documentation(https://sourceforge.net/docman/display_doc.php?docid=14033&group_id=1#firewall)
D:\share_for_spiff\sphinx related\sphinx4-1.0beta\bin>cvs -d :pserver:anonymous@
cvs-pserver.sourceforge.net:80/cvsroot/cmusphinx co sphinx4
Unknown host cvs-pserver.sourceforge.net.
has not worked for me.(have tried port 443 as well)
2 Qs here :
i. how can i improve the accuracy for this demo ?
ii. is it possible to add to the list of sentences which can be recognized?
if so, how exactly 2 go abt it ?
some major irritants
a. 'the' in the beginning of a sentence is almost never recognized
b. 'purple' is rarely recognized correctly.it is almost always recognized as 'front'
i am contemplating a scenario where the SR is done on a central server as opposed to doing it on individual machines as individual devices often have limited computation and memory capabilities.
so the idea is to have a v.v. high grade SR done centrally on a server which would do SR for several devices which would submit their individual "SR jobs"( for lack of a better word) to it( "the SR server").
Can u sugest some other options (other than sphinx) which could meet this requirement ?
i am aware of only MSS(microsoft speech server)
would sphinx be a good choice for such a scenario ?
would changes be required to sphinx in its current form to do this i.e. to make it enterprise grade ?
awaiting an early reply
thanks a ton
ashish
Hi,
For question (1). You should use a different acoustic model, for example WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar which you can find it in the directory lib in sphinx 4. You also need a different language model (don't know whether it come with sphinx4) and dictionary (dictionary is inside the jar file). Read the Sphinx4 documentation for information on how to use a new acoustic model. On the parameters to use, you can refer to the regression test of hub4 (sphinx4/tests/performance/hub4/hub4.config.xml).
TP