Menu

Recognising All Sub-Phrases in a JSGF Grammar

Help
A S
2019-05-27
2019-05-31
  • A S

    A S - 2019-05-27

    I am using a jsgf format grammar with pocketsphinx. I want to be able to match/recognise a only a finite number of possible input sentences (i.e. a command and control type interface).

    However, due to the nature of these “valid” sentences, and sentence fragment (sub-sentence) with three or more words is also a valid input.

    For example, if the valid sentences are:

    • This is a valid sentence
    • So is this one

    then the grammar should contain:

    • This is a
    • This is a valid
    • This is a valid sentence
    • is a valid
    • is a valid sentence
    • a valid sentence
    • So is this
    • So is this one
    • is this one

    [Obviously, in this example, this grammar seems a bit bizarre, but it does make sense in the language/context I am using.]

    Is there any way this can be done automatically using jsgf grammar syntax or any pocketsphinx / CMU Sphinx tool. Of course I could programmatically generate a grammar file given a set of valid full sentences, but this seems a bit long-winded and makes the grammar file long and difficult to edit directly (or even just to determine what the valid full sentences are).

    The best I can come up with just using the rules of the jsgf format (that I know of) is the following. (For the example above:)

    <a> = this is a;
    <valid> = (<a> | is a) valid;
    <sentence> = (<valid> | a valid) sentence;
    
    <this> = so is this;
    <one> = (<this> | is this) one;
    

    (E.g. the rule <valid> matches any sub-sentence of length three or more which ends with the word “valid”.)</valid>

    This grammar contains exactly the valid strings listed above. Nevertheless, this grammar still needs to be generated programmatically from the set of valid full sentences, so I may as well list all sub-sentences explicitly. This format does however make it slightly easier to see directly from the jsgf file what the valid full sentences are.

    Essentially, my question is: Is there a better way to achieve the above grammar with pocketsphinx?

     
    • Nickolay V. Shmyrev

      This should work:

       <sentence> = this | is  | a | valid | sentence | one | so;
      

      If you need something more advanced, you can build a bigram langauge model and convert it to fsg like in https://www.danielpovey.com/files/2015_icassp_librispeech.pdf on Figure 1.

      Unigram like above will usually work fine.

       

      Last edit: Nickolay V. Shmyrev 2019-05-27
      • Nickolay V. Shmyrev

        And it is not the task of the grammar to enforce complex contstrains in recognized speech. You need to build a post-processor for that.

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.