We had previously been doing a lot of work with one of the L&H speaker independent engines. It uses a BNF grammar to specify legal sentences in the language.
As you may have noticed, L&H is not going to be able to provide us with the engine we needed, so I am researching other options. Sphinx appears to be our next best bet. I have converted our grammar into a list of sample sentences and submitted that to the web-bases language model builder. The resulting language model seems to be exceedingly fond of the numeral "4" as a word, and appears to use it any time it is at all confused.
But much more importantly, is there any way to make subphrases of sentences that will not be broken apart by the language modelling tool? For instance:
Call Tom Johnson at Priority Level 4.
The important, unbreakable subphrases there are:
"Call"
"Tom Johnson"
"at Priority Level"
"4"
Basically, we have 10 or fewer commands, including the optional pieces (like "at Priority Level"), and then we have anywhere from 50-10,000 names. I would like it to treat the command phrases as single words and each name as a single word.
How can I do that? Without this extra context to provide trigram support, the recognizer doesn't stand a chance of being able to successfully recognize my commands.
Thanks,
Mac Reiter
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I believe at the bottom of the docs you can find a paragraph on this that talks about using an _ between words.. when you submit them to the webtool.
CALL_TOM_JOHNSON_AT_PRIORITY_LEVEL_4
so breaking it up your list should probably look like this:
CALL
TOM_JOHNSON
AT_PRIORITY_LEVEL
4
-max
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'd also be interested in applying a BNF grammar to Sphinx. Is it possible to replace the 3rd pass, where the word lattice is searched for the best path, with a search that recognizes the BNF grammar?
There's also the idea of post-processing the output of Sphinx with a BNF grammar, "massaging" the output until it parses. This method doesn't seem particularly robust however.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, you could turn the BNF grammar into the arpabo-style n-gram models that sphinx uses by approximating the weights. That would get your grammar into the recognizer. Otherwise you'd need a large-vocabulary language model that you would rescore the n-best from, which is also possible, though we don't have any good large public language models yet.
Eventually, we will add BNF support, either by providing tools to convert an EBNF flavor to n-gram models or by explicitly adding engine support for it. Interesting that VoiceXML is now adding arpabo model support as an extension; the trivial grammars in most VoiceXML nodes are a degenerate case of the full power of a language model, and we're seeing that need percolate as people do real work with recognizers.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The easiest way to do this is to make lexicon entries with undersocres in them as word entries.
Thus, you can get WHERE_CAN_I and WHERE_DO_I as single word entries. The lmtool (language model tool) on the web will behave correctly with these.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
We had previously been doing a lot of work with one of the L&H speaker independent engines. It uses a BNF grammar to specify legal sentences in the language.
As you may have noticed, L&H is not going to be able to provide us with the engine we needed, so I am researching other options. Sphinx appears to be our next best bet. I have converted our grammar into a list of sample sentences and submitted that to the web-bases language model builder. The resulting language model seems to be exceedingly fond of the numeral "4" as a word, and appears to use it any time it is at all confused.
But much more importantly, is there any way to make subphrases of sentences that will not be broken apart by the language modelling tool? For instance:
Call Tom Johnson at Priority Level 4.
The important, unbreakable subphrases there are:
"Call"
"Tom Johnson"
"at Priority Level"
"4"
Basically, we have 10 or fewer commands, including the optional pieces (like "at Priority Level"), and then we have anywhere from 50-10,000 names. I would like it to treat the command phrases as single words and each name as a single word.
How can I do that? Without this extra context to provide trigram support, the recognizer doesn't stand a chance of being able to successfully recognize my commands.
Thanks,
Mac Reiter
I believe at the bottom of the docs you can find a paragraph on this that talks about using an _ between words.. when you submit them to the webtool.
CALL_TOM_JOHNSON_AT_PRIORITY_LEVEL_4
so breaking it up your list should probably look like this:
CALL
TOM_JOHNSON
AT_PRIORITY_LEVEL
4
-max
I'd also be interested in applying a BNF grammar to Sphinx. Is it possible to replace the 3rd pass, where the word lattice is searched for the best path, with a search that recognizes the BNF grammar?
There's also the idea of post-processing the output of Sphinx with a BNF grammar, "massaging" the output until it parses. This method doesn't seem particularly robust however.
Well, you could turn the BNF grammar into the arpabo-style n-gram models that sphinx uses by approximating the weights. That would get your grammar into the recognizer. Otherwise you'd need a large-vocabulary language model that you would rescore the n-best from, which is also possible, though we don't have any good large public language models yet.
Eventually, we will add BNF support, either by providing tools to convert an EBNF flavor to n-gram models or by explicitly adding engine support for it. Interesting that VoiceXML is now adding arpabo model support as an extension; the trivial grammars in most VoiceXML nodes are a degenerate case of the full power of a language model, and we're seeing that need percolate as people do real work with recognizers.
The easiest way to do this is to make lexicon entries with undersocres in them as word entries.
Thus, you can get WHERE_CAN_I and WHERE_DO_I as single word entries. The lmtool (language model tool) on the web will behave correctly with these.