Menu

small language model tuning

2015-10-02
2015-10-03
  • K R Srinidhi

    K R Srinidhi - 2015-10-02

    Hi,

    I am having of list of movie names , song names and singers (upto 1.5 lakh).
    I have built a 3-gram language model (with improved-kneser-ney smoothing) for speech recognition on movie/song/singer names. Currently I am able to get around 70 % recognition accuracy using the above 3-gram language model . Some words in the list have more frequency and some words in the list have less frequency like for example :
    Dhil deewana
    Dhil dhadakne do
    Dhil to pagal he
    "Dhil" in above list appears in multiple movie names.
    some movie names like
    "fanaa"
    appear very few times in list and the name is single word only.
    Now when I have an utterance with "fanaa" it is getting recognized as "ko naam"
    "ko" and "naam" have very high frequency in the corpus used to build language model.

    Now how can I artificially tune the corpus so that less frequent names (may be single word or multiword names") gets recognized .
    More suggestions are welcome for tuning the language model the so that recognition accuracy improves.

    Thanks
    Srinidhi

     
    • bic-user

      bic-user - 2015-10-02

      Did you try to play with language weight? "-lw" option. Try several

       
  • K R Srinidhi

    K R Srinidhi - 2015-10-02

    I have tried different weights and found an optimal weight where I am getting recognition accuracy of 70% . But now I would like to tweak leanguage model so that even less frequent words in corpus gets recognized. (may be by manually tweaking corpus ). Every name (may be single or multiword) is present only once in the corpus.

     
    • Nickolay V. Shmyrev

      Language model tweaks are pretty senseless for the good accuracy. They can give you a couple of percents if vocabulary is small, but overall you need to get the good acoustic model first.

      Since you are trying to recognize Hindi names and I doubt you have a good Hindi acoustic model you should probably better focus on that, not on the langauge model.

       
  • K R Srinidhi

    K R Srinidhi - 2015-10-02

    I have trained the accoustic model using 4 million transcriptions.
    I have a test set of 3000 utterances. When I build a language model using transcriptions for those 3000 utterances only , I get accuracy of 97-98 %. But when I take a list of movie names of around 1 lakh and build a language model , utterance containing "fanaa" is not getting recognized with language model built with 1 lakh movie name corpus (also containing text "fanaa").

     
    • Nickolay V. Shmyrev

      There could be many causes that particular phrase is not recognized. From stupid mistakes in phonetic dictionary to more complex interactions between acoustic model, langauge model and search beams. You have to doublecheck everything and consider all possible reasons, not just the langauge model.

      To verify langauge model compute perplexity on the test set, it should be reasonably small below 200.

       
      • Nickolay V. Shmyrev

        And if you care about probability in langauge model, you can modify counts by popularity when building langauge model. "Fanaa" is way more popular than "Jab we met" or something. So you can create counts which would prefer former for the latter. Such langauge model might be more reasonable.

         
  • K R Srinidhi

    K R Srinidhi - 2015-10-03

    How do I create counts for "fanaa" or any similar less frequent names so that they are recognized more consistently. What I have to do while building language model to achieve the same

     
    • Nickolay V. Shmyrev

      You can find a description of counts file here:

      http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html

      You can use ngram-count to estimate model from counts file. The counts might not be necessary counts but could be based on movie popularity. You can obtain movie popularity rating from the web or something.

       

Log in to post a comment.