I am looking into building a STT app for medical professionals. As you guys already know, the words can be difficult to pronounce and often misinterpreted by systems. I was looking at Sphinx4 and PocketSphinx.
I need real time processing, and will ideally have it on a server.
Should I modify dictionary, acoustic model, language model or all of them?
How do I get started and where to find basic help?
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am looking into building a STT app for medical professionals. As you guys already know, the words can be difficult to pronounce and often misinterpreted by systems. I was looking at Sphinx4 and PocketSphinx.
I need real time processing, and will ideally have it on a server.
Should I modify dictionary, acoustic model, language model or all of them?
How do I get started and where to find basic help?
Thanks!
all of them
http://cmusphinx.sourceforge.net/wiki/tutorial
Thanks. Which one would you recommend, Sphinx4 or Pocketsphinx.
I do not know details of your project to give you recommendation.
Serverside ASR, with an Emphasis on speed
For serverside processing it is better to use Kaldi.
What are the differences between sphinx and Kaldi. I was under the impression the sphinx is better