## Indri Document Scoring

Indri uses the language modeling approach to information retrieval. Language modeling assigns a probability value to each document, meaning that every score is a value between 0 and 1. For computational accuracy reasons, Indri returns the log of the actual probability value. log(0) equals negative infinity, and log(1) equals zero, so Indri document scores are always negative.

Without diving into a lot of math, it's probably best to assume that these values are not comparable across queries. In particular, you'll probably notice that as you add words to a query, the average document score tends to drop, even though the system probably gets better at finding good documents.

By default, Indri uses a query likelihood function with Dirichlet prior smoothing to weight terms. The formulation is given by:

`c(w;D) =`count of word in the document

`c(w;C) =`count of word in the collection

`|D| =`number of words in the document

`|C| =`number of words in the collection

`numerator = c(w;D) + mu * c(w;C) / |C|`

`denominator = |D| + mu`

`score = log( numerator / denominator )`

By default, mu is equal to 2500, which means that for the very small documents you're using, the score differences will be very small.