Hello all. I'm looking to simply do speech to text processing in real-time (or within a second of real-time) and am wondering how much work would be involved to implement that. I'm an experienced Java app programmer and have done some audio apps, but never speech.
Is this an appropriate project to use or is there another that's less focused (as this appears to be) on all the other aspects of speech (rather than just getting text from it)?
Posting this in hopes someone can answer this rather than finding out the hard way I've got the wrong speech library!
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2006-05-11
Colin -- sorry, but if you expect any kind of answer, you'll need to describe your intended task more completely than "simply do speech to text processing in real-time (or within a second of real-time)". One could write volumes on this question, but for starters:
- what are the conditions (noise, microphone placement, ...) under which the speech is recorded?
- single speaker, or many? Adult? Child? Male? Female?
- what language? Dialect?
- vocabulary size? Constrained? Unlimited?
- how is the language constrained -- spontaneous speech? reading? constrained command language?
All those things (and more) are necessary to define the problem. There are many sets of conditions for which speech recognizers such as one of the Sphinxen can do a creditable job of recognizing the words spoken, but under other conditions, the task could be hopeless. You haven't yet told us nearly enough for anyone to give you an answer.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello all. I'm looking to simply do speech to text processing in real-time (or within a second of real-time) and am wondering how much work would be involved to implement that. I'm an experienced Java app programmer and have done some audio apps, but never speech.
Is this an appropriate project to use or is there another that's less focused (as this appears to be) on all the other aspects of speech (rather than just getting text from it)?
Posting this in hopes someone can answer this rather than finding out the hard way I've got the wrong speech library!
Thanks
Colin -- sorry, but if you expect any kind of answer, you'll need to describe your intended task more completely than "simply do speech to text processing in real-time (or within a second of real-time)". One could write volumes on this question, but for starters:
- what are the conditions (noise, microphone placement, ...) under which the speech is recorded?
- single speaker, or many? Adult? Child? Male? Female?
- what language? Dialect?
- vocabulary size? Constrained? Unlimited?
- how is the language constrained -- spontaneous speech? reading? constrained command language?
All those things (and more) are necessary to define the problem. There are many sets of conditions for which speech recognizers such as one of the Sphinxen can do a creditable job of recognizing the words spoken, but under other conditions, the task could be hopeless. You haven't yet told us nearly enough for anyone to give you an answer.
cheers,
jerry