I've been experimenting with pocketsphinx (specifically, the GStreamer
plugin). My use case is for a simple spoken remote-control for my media
player, where I enter a specific set of sentences that can be recognised, and
the recogniser chooses one of them. I've been doing this by creating a text
grammar and then compiling it with the online lmtool application. However,
pocketsphinx treats the grammar as a list of words, not a list of sentences:
for example, if I compile the grammar
HELLO THERE
GOODBYE NOW
the recogniser will recognise the sentence "HELLO NOW", and I don't want it
to; I want it to only recognise the sentences I enter and not assemble others
out of the words. I've been working around this by creating each sentence as a
single hyphen-separated "word":
HELLO-THERE
GOODBYE-NOW
which works, but if I get into mildly complicated sentences then I hit the
35-character limit on a single token and my grammar won't compile. Also, this
is, let us be honest, a bit of a hack. So, how should I be doing this?
sil
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Except language models pocketsphinx supports jsgf grammars which define more
restricted search space. You can try -jsgf option of the decoder.
But in general it's naive to expect your user to say predefined set of
sentences. The language is never so restricted. The proper way would be to
implement a semantic parser on top of the recognizer output. You can check
I've been experimenting with pocketsphinx (specifically, the GStreamer
plugin). My use case is for a simple spoken remote-control for my media
player, where I enter a specific set of sentences that can be recognised, and
the recogniser chooses one of them. I've been doing this by creating a text
grammar and then compiling it with the online lmtool application. However,
pocketsphinx treats the grammar as a list of words, not a list of sentences:
for example, if I compile the grammar
HELLO THERE
GOODBYE NOW
the recogniser will recognise the sentence "HELLO NOW", and I don't want it
to; I want it to only recognise the sentences I enter and not assemble others
out of the words. I've been working around this by creating each sentence as a
single hyphen-separated "word":
HELLO-THERE
GOODBYE-NOW
which works, but if I get into mildly complicated sentences then I hit the
35-character limit on a single token and my grammar won't compile. Also, this
is, let us be honest, a bit of a hack. So, how should I be doing this?
sil
Except language models pocketsphinx supports jsgf grammars which define more
restricted search space. You can try -jsgf option of the decoder.
But in general it's naive to expect your user to say predefined set of
sentences. The language is never so restricted. The proper way would be to
implement a semantic parser on top of the recognizer output. You can check
http://wiki.speech.cs.cmu.edu/olympus/index.php/Olympus