Menu

Theory behind unigram and language wt

Help
2011-06-18
2012-09-22
  • Pranav Jawale

    Pranav Jawale - 2011-06-18

    Hi,

    I was debugging sphinx3_decode and I found that a function

    static void
    lm_uw(lm_t * lm, float64 uw)
    

    is used to change the unigram probabilities of the words. As in if there are 5
    equiprobable words (Pr(each word) = 0.2).
    Then lm_uw function is called. uw(unigram weight) is set to 0.7 by default.

    The comment in this function says "/ Interpolate unigram probs with uniform
    PDF, with weight uw
    /"

    What the function does is, if the unigram probability (as defined in LM) is
    0.2, it will modify it to
    0.2*uw + (1-uw)/(#words - 1)

    Why is this interpolating done? Could someone point to some ref?

    Later on in lm_set_param(lm, lw, wip) this updated unigram prob is modified in
    the following way

    log(unigram_prob)*language_weight + log(word_insertion_penalty)

            lm->ug[i].prob.l =
                (int32) ((lm->ug[i].prob.l - lm->wip) * f) + iwip;
    

    lm->wip is zero so can be neglected. iwip corresponds to -wip param given to
    decoder.

    Here what is the 'theory' behind language weight?

    Thanks.

     
  • Nickolay V. Shmyrev

    Hello

    Smoothing with uniform probability is a common thing in language model
    estimation, see for example descriptoin of smoothing here:

    An Empirical Study of Smoothing Techniques for Language Modeling

    Stanley F. Chen and Joshua Goodman

    http://research.microsoft.com/en-
    us/um/people/joshuago/tr-10-98.pdf

    unigram weight is just a way to adjust this smoothing in runtime.

     
  • Vassil Panayotov

    Language model weight is usually used to balance/tune the relative influence
    of the acoustic and language models on the search. My understanding is that if
    you don't use lw, LM will have a negligible impact and the outcome of the
    search will be decided by the acoustic model alone.

     
  • Pranav Jawale

    Pranav Jawale - 2011-06-19

    Thanks.

    So "uw" is used to make all the unigram probabilities more uniform as the text
    from which LM is created might induce bias towards some words.

    "lw" seems more like an empirically determined param. I also found a paper
    where they try to vary lw (no better results though)

    Towards a Dynamic Adjustment of the Language Weight (2001)
    by Georg Stemmer , Viktor Zeissler , Elmar Nöth , Heinrich Niemann
    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.856

     

Log in to post a comment.