Menu

unbreakable phrases

Help
Anonymous
2001-06-07
2012-09-22
  • Anonymous

    Anonymous - 2001-06-07

    Hello,

    We had previously been doing a lot of work with one of the L&H speaker independent engines.  It uses a BNF grammar to specify legal sentences in the language.

    As you may have noticed, L&H is not going to be able to provide us with the engine we needed, so I am researching other options.  Sphinx appears to be our next best bet.  I have converted our grammar into a list of sample sentences and submitted that to the web-bases language model builder.  The resulting language model seems to be exceedingly fond of the numeral "4" as a word, and appears to use it any time it is at all confused.

    But much more importantly, is there any way to make subphrases of sentences that will not be broken apart by the language modelling tool?  For instance:

    Call Tom Johnson at Priority Level 4.

    The important, unbreakable subphrases there are:
    "Call"
    "Tom Johnson"
    "at Priority Level"
    "4"

    Basically, we have 10 or fewer commands, including the optional pieces (like "at Priority Level"), and then we have anywhere from 50-10,000 names.  I would like it to treat the command phrases as single words and each name as a single word.

    How can I do that?  Without this extra context to provide trigram support, the recognizer doesn't stand a chance of being able to successfully recognize my commands.

    Thanks,
    Mac Reiter

     
    • Ken M

      Ken M - 2001-06-10

      I believe at the bottom of the docs you can find a paragraph on this that talks about using an _ between words.. when you submit them to the webtool.
      CALL_TOM_JOHNSON_AT_PRIORITY_LEVEL_4
      so breaking it up your list should probably look like this:
      CALL
      TOM_JOHNSON
      AT_PRIORITY_LEVEL
      4

      -max

       
    • Steven E Hugg

      Steven E Hugg - 2001-06-12

      I'd also be interested in applying a BNF grammar to Sphinx.  Is it possible to replace the 3rd pass, where the word lattice is searched for the best path, with a search that recognizes the BNF grammar?

      There's also the idea of post-processing the output of Sphinx with a BNF grammar, "massaging" the output until it parses.  This method doesn't seem particularly robust however.

       
      • Kevin A. Lenzo

        Kevin A. Lenzo - 2001-07-10

        Well, you could turn the BNF grammar into the arpabo-style n-gram models that sphinx uses by approximating the weights.  That would get your grammar into the recognizer.  Otherwise you'd need a large-vocabulary language model that you would rescore the n-best from, which is also possible, though we don't have any good large public language models yet.

        Eventually, we will add BNF support, either by providing tools to convert an EBNF flavor to n-gram models or by explicitly adding engine support for it.  Interesting that VoiceXML is now adding arpabo model support as an extension; the trivial grammars in most VoiceXML nodes are a degenerate case of the full power of a language model, and we're seeing that need percolate as people do real work with recognizers.

         
    • Kevin A. Lenzo

      Kevin A. Lenzo - 2001-07-10

      The easiest way to do this is to make lexicon entries with undersocres in them as word entries.
      Thus, you can get WHERE_CAN_I and WHERE_DO_I as single word entries.  The lmtool (language model tool) on the web will behave correctly with these.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.