Menu

Writing a general grammar

Help
2007-09-19
2012-09-22
  • SmillingMan

    SmillingMan - 2007-09-19

    Hi,

    I want to write a grammar that accpets all the words in a given dictionary. Can you please help me with that. I think it should be a one line grammer, something like

    public <newGrammar> = *;

    Thanks,
    Foad

     
    • Nickolay V. Shmyrev

      It can be unigram language model I suppose

       
    • SmillingMan

      SmillingMan - 2007-09-19

      Yes, it's a unigram (a list of single words) but how can I write the grammar file so that all the words are accepted?

      Thanks,
      Foad

       
    • Nickolay V. Shmyrev

      Something like:

      \data\
      ngram 1=8

      \1-grams:
      -1.1244 <UNK> 0.0000
      -1.4254 5.75 -0.4605
      -1.7264 </s> 0.0000
      -1.4254 <s> -0.4605
      -1.4254 Fed -0.4605
      -1.4254 Le -0.4605
      -1.4254 a -0.4605
      -1.4254 abaissé -0.4605

      it's an arpa format. You can use equal weights or cmulmtk tools to create arpabo model from text with proper probabilities.

       
    • SmillingMan

      SmillingMan - 2007-09-20

      Thanks for the sample. Is there a way to write a JSGF grammar to accept the general list?

      To recognize two words (one, two), I can write a JSGF rule:

      public <newGrammar> = one | two;

      I am looking for a similar rule to accept all words in the dictionary.

      Best,
      Foad

       
      • Nickolay V. Shmyrev

        Hm, * denotes repeating. So the thing you like should be addition to JSGF standard I suppose. I think it even can be implemented with a special tag like <WORD>. Tag like <VOID> is similar but I'm not sure how it works. When you will have too much words in a dictionary, such thing will not be useful in my opinion since it will not constrain search at all.

         
        • David Huggins-Daines

          This is an interesting idea, I think that it should be implemented in the FSG search module. It would not be useful to have a grammar that simply consists of <WORD>, but it would be nice to be able to allow any word in the dictionary to be recognized at certain points (perhaps as a "backoff" node with low probability).

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.