Menu

is there a english.gram file for sphinx4?

Help
prolog1980
2009-05-01
2012-09-22
  • prolog1980

    prolog1980 - 2009-05-01

    I am trying to test the transcribing using sphinx4 for an audio file where the speaker is giving a speech in english.
    I began by adopting the default Transciber demo which comes with sphinx4 download.
    Transcriber demo works for audio files where the speaker is speaking digits only.
    I have noticed a digits.gram file which this example uses.

    My first step was to change from TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz acoustic model to WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz acoustic model in the config file.

    I than looked for a equivalent to digits.gram file containing words in english but no luck.

    Does anyone no where i can find a generic english.gram file i persume written in Java Speach grammar format? or any other suggestions?

     
    • Nickolay V. Shmyrev

      The grammar should depend on the type of text you are trying to recognize. For specific topic you need to create a trigram language model with cmuclmtk from similar texts.

       
    • prolog1980

      prolog1980 - 2009-05-01

      isn't there a default trigram language model that i can use? or an example on haw to create one?

       
    • prolog1980

      prolog1980 - 2009-05-01

      thanks Nickolay.

      still having trouble understanding though.
      I'v just been doing some reading on the net, and am wondering do i really need a grammar.
      Logicaly all i need is a dictionary that can translate Sound to Text. Grammar would be more appropriate on the applications where based on the sound input a particular event is fired. I ma not firing any events. I just simply want a conversion of sound to text.

      i downloaded lm_giga_5k_nvp.hvite.dic file from the link above. but don't know where to place this file. The sphinx4-1.0beta2-src\sphinx4-1.0beta2\src\sphinx4\edu\cmu\sphinx\linguist\dictionary folder conatins only java source files. doesn't seem like the right place to place this dictionary file.

      any ideas?

       
      • Nickolay V. Shmyrev

        > Logicaly all i need is a dictionary that can translate Sound to Text.

        No, it's not correct. To use ASR you need three basic things - acoustic model, dictionary and a grammar. As an acoustic model you can use wsj. For large vocabulary recognition you need large trigram model like lm_giga, but it's much better to use domain-oriented grammars instead as I told you in the first post

        As an example of using grammar for recognition you need to look in tests/performance/wsj5k.config.xml. You need to configure lexTreeLinguist and and a trigram model of type LargeTrigramModel. Also you need to convert lm_giga's arpa text model to compressed dmp format with lm3g2dmp.

        You can create your own trigram model from sample texts with cmuclmtk for example.

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.