
Using ARPA N-grams both in n-best sentence rescoring and const-arpa lattice rescoring

  • Akira Miasato

    Akira Miasato - 2015-05-15

    Hello everyone,

    I've been using modified versions of 'rnnlmrescore'/'rnnlm_compute_scores' scripts for rescoring via n-best probability reestimation. Those modifications were made so I could handle generic language models in the re-scoring step.

    I discovered that the lattice-to-nbest tool has parameters which greatly impact in this task, namely the AM/LM weights and the n-best list length. To check the implementation validity, I used a SRILM-trained LM both as a const-arpa for the 'lattice-lmrescore-const-arpa' tool, and as a generic language model on the modified rescoring scripts. Then, I compared the results of both rescoring recipes in WER terms.

    However, on some validation datasets, there is a big difference between the WER reduction on the full lattice const-arpa rescoring and in the nbest-to-lattice rescoring, even with optimal parameters (big enought nbest length, best AM/LM weight estimated on the first pass evaluation). In online decoding, for instance, the nbest-to-lattice simply doesn't reduce the WER at all, increasing it instead.

    Is this an expected behavior? Is this in any way similar to rnnlm rescoring behavior, dealt by those who used the original scripts?



    Last edit: Akira Miasato 2015-05-19
    • Daniel Povey

      Daniel Povey - 2015-05-15

      Doing language model rescoring by rescoring n-best lists is always
      going to be approximate because the n-best list (for any reasonable n)
      can only represent a tiny portion of the variety in the lattice. It's
      important to generate it using a lmwt/acwt ratio similar to the
      optimal one.

      At some point Guoguo Chen (cc'd) is going to work on an RMMLM
      rescoring strategy based on lattice rescoring rather than n-best
      lists, which will be much closer to exact.


      I've been using modified versions of 'rnnlmrescore'/'rnnlm_compute_scores'
      scripts for rescoring via n-best probability reestimation. Those
      modifications were made so I could handle generic language models in the
      re-scoring step.

      I discovered that the lattice-to-nbest tool has parameters which greatly
      impact in this task, namely the AM/LM weights and the n-best list length. To
      check the implementation validity, I used a SRILM-trained LM both as a
      const-arpa for the 'lattice-lmrescore-const-arpa' tool, and as a generic
      language model on the modified rescoring scripts. Then, I compared the
      results of both rescoring recipes in WER terms.

      However, on some validation datasets, there is a big difference between the
      WER reduction on the full lattice const-arpa rescoring and in the
      nbest-to-lattice rescoring, even with optimal parameters (big enought nbest
      length, best LM score estimated on the first pass evaluation). In online
      decoding, for instance, the nbest-to-lattice simply doesn't reduce the WER
      at all, increasing it instead.

      Is this an expected behavior? Is this in any way similar to rnnlm rescoring
      behavior, dealt by those who used the original scripts?


      Using ARPA N-grams both in n-best sentence rescoring and const-arpa lattice

      Sent from because you indicated interest in

      To unsubscribe from further messages, please visit

  • Akira Miasato

    Akira Miasato - 2015-05-18

    Thanks for the informative answer.

    On another note, are there any big differences between lattices generated by online and offline recipes? It seems to me that it is very difficult to beat the 1st-pass in online decoding using n-best rescoring.

    • Daniel Povey

      Daniel Povey - 2015-05-18

      On another note, are there any big differences between lattices generated by
      online and offline recipes? It seems to me that it is very difficult to beat
      the 1st-pass in online decoding using n-best rescoring.

      The format of the lattices is the same. Make sure you are using the
      correct LM-scale (normally the inverse of the acoustic scale you
      decoded with) to get the n-best list. In this case the 1-best from
      the lattice, before LM rescoring, should be the same as the decoded
      output; it's a good idea to verify this. And because it will be
      difficult to replicate the acoustic computations that go on in the
      online decoding, make sure you don't use any of the modes of language
      model rescoring that touch the acoustic scores. (If you don't know
      what this means, it means basically don't use any program with 'gmm'
      or 'nnet' in its name; probably you're not anyway so just ignore this
      if you find it confusing).

      There are a lot of things that can go wrong in LM rescoring, and there
      are different strategies of LM rescoring demonstrated in the scripts.
      Kaldi lattices have two costs per arc: the "acoustic" cost, and the
      "graph" cost. The graph cost contains transition probabilities,
      lexicon costs and language model scores.
      One strategy is to completely remove the "graph" part of the score,
      and reconstruct it by adding transition-model scores, and composing
      with the lexicon and then the LM. Another strategy is to subtract the
      score from the "old" LM (the one that you decoded with) and then add
      in the score from the "new" LM (the one that you want to use). Of
      course if you want interpolation you can just subtract a constant
      times the "old" LM score and add in another constant times the new LM


      Using ARPA N-grams both in n-best sentence rescoring and const-arpa lattice

      Sent from because you indicated interest in

      To unsubscribe from further messages, please visit