CMU Sphinx / Forums / Help: Sphinx and Telephony Servers

Speech Recognition Toolkit

Sphinx and Telephony Servers

Forum: Help

Creator: Roy

Created: 2006-04-27

Updated: 2012-09-22

Roy - 2006-04-27

Hi,

I am investigating the possibility of using speech recognition in a distributed multi-tier system. Here is a brief summary of what is required:

Multiple clients will use a telephone as the primary input device for this system. The speech recognition must interpret random dictation, not just contextual voice commands as used in areas such as telephone booking systems. The speech recognition software must have a suitable API in which interpreted text can be captured on a real time basis and processed for any particular need.

Further to this the initial cost in purchasing hardware to support the speech recognition software should be minimal. Simulated telephone calls should be available for the purpose of development and testing.

Ideally the software products would be open source and available for use in the .NET framework either as native .NET components or wrappers using other compiled source code. If the software has no compatibility with .NET then it must be interfaced with .NET through the use of middleware (CORBA etc).

First of all, is it feasible to employ successful voice recognition with a telephone as the primary input device? The telephone has obvious disadvantages in that the audio quality is less than that of a PC microphone. If this is possible cuold Sphinx be employed to make this happen in a multi-tier, distributed system?

An example could be where it is used to handle multiple speech recognition sessions from a telephony server and transmit the textual output through a CORBA remoting server to a .NET server. If this is one such scenario that could work, some pointers as to where to go to materialise this would be helpful. Is there a way of controlling Sphinx through .NET? My skills are mainly in .NET and some Java.

I would appreciate any feedback on whether this is a viable solution and how it could possibly be achieved. Please forgive me if my ideas seem outlandish or ridiculous, I am new to all of this!

Many Thanks

Roy

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Robbie - 2006-04-28
  
  Sounds like a big (and fun) project. You are right that phone-quality speech is less-than ideal for speech recognition, but likely this disadvantage is independent of the recognizer. Unfortunately, you may be hard-pressed to find an acoustic model that was trained over telephone speech, so you will have to spend the time to train one (I've been told this takes about two weeks). Training an acoustic model requires sufficient amounts of data--telephone speech in your case--that resembles as closely as possible the target data. The training corpus must also have a word-level transcription. Again, I don't think you'll find any pre-built acoustic models, so you will probably have to purchase such a corpus--but you will likely have to do this no matter which recognizer you use.
  
  I believe that Sphinx can be used in a multi-tier distributed system, as I've seen similar posts before. A while back I implemented a very simple sockets-based Sphinx server, so I'm optimistic that a multi-tier distributed system could be built.
  
  As for interfacing with .NET, I imagine you would end up using middleware as you proposed. But if you are intent on using .NET (I like C# better than Java myself, except that it is not as portable), you may consider looking into Microsoft's Speech SDK which also provides a speech recognizer. I imagine (though I'm not sure), you could use your own acoustic models and language models within that framework. However, this could be false because most of what I've seen from my limited exposure to the SDK is speaker-dependent and you will want speaker-independent recognition.
  
  Well, I'm not the most qualified to answer your questions, so hopefully others will post, but I hope this was at least a start!
  
  Regards,
  Robbie
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sphinx and Telephony Servers

Speech Recognition Toolkit

Forums

Help

Sphinx and Telephony Servers document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Sphinx and Telephony Servers