I'm interested in developing a speech-recognition system based on Lojban (http://www.lojban.org). How feasible is this with Sphinx? Lojban has the advantage of audio-visual isomorphism so a pronounciation dictionary shouldn't be necessary. However, it does use pauses to disambiguate certain constructs. I don't have a lot of experience with systems like this -- can someone give me pointers as to where I could start? or is the whole idea crazy? :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The example programs, including sphinx2-continuous and sphinx2-ptt (push-to-talk) use the libsphinx2 library. If you look at the C files in examples/, you can see how they're used. The examples just print out _some_ of the information available -- you can also get detailed timing information out of it, with a little more code.
The recognizer still needs a pronunciation dictionary and and a language model to run, though i suppose you could generate on automatically. First off, i'd try making up some sentences of the type you'll want to recognize, and use the web-based lmtool linked off http://www.speech.cs.cmu.edu/sphinx to build a language model and see if it works for you. The CMU-Cambridge SLM (statistical language model toolkit) could be used, too -- it's for bigger models. I'm not familiar enough with lojban or i'd try cooking something up to help :)
Pointer: look in src/examples/tty-continuous.c for how you use the library. I'll try to get another example set up that exposes more of the hypothesis information.
kevin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm interested in developing a speech-recognition system based on Lojban (http://www.lojban.org). How feasible is this with Sphinx? Lojban has the advantage of audio-visual isomorphism so a pronounciation dictionary shouldn't be necessary. However, it does use pauses to disambiguate certain constructs. I don't have a lot of experience with systems like this -- can someone give me pointers as to where I could start? or is the whole idea crazy? :)
It can be done!
The example programs, including sphinx2-continuous and sphinx2-ptt (push-to-talk) use the libsphinx2 library. If you look at the C files in examples/, you can see how they're used. The examples just print out _some_ of the information available -- you can also get detailed timing information out of it, with a little more code.
The recognizer still needs a pronunciation dictionary and and a language model to run, though i suppose you could generate on automatically. First off, i'd try making up some sentences of the type you'll want to recognize, and use the web-based lmtool linked off http://www.speech.cs.cmu.edu/sphinx to build a language model and see if it works for you. The CMU-Cambridge SLM (statistical language model toolkit) could be used, too -- it's for bigger models. I'm not familiar enough with lojban or i'd try cooking something up to help :)
Pointer: look in src/examples/tty-continuous.c for how you use the library. I'll try to get another example set up that exposes more of the hypothesis information.
kevin