The Lemur Project Wiki

Search engine and data mining applications and ClueWeb datasets.

Brought to you by: cammiemw, david_fisher, gregorybrooks, jamiecallan, sm-harding

Gathering Statistics

Authors:

Next, the system applies a ContextCountGraphExtractor to extract the counts for the statistics on the scoring nodes.

Gather Context Nodes to Count

Through the use of the ContextCountGraphExtractor copier, a new parse tree is created that holds a vector of indri::lang::ContextCounterNode nodes. These nodes are subclassed from the indri::lang::AccumulatorNode type and gathers the internal contexts and counts of each scoring node.

A vector of indri::lang::AccumulatorNode types are stored in the copier for extraction.

Gather and Sum the Scores

Next, a call to _sumServerQuery() is made. The first thing that happens is the query is run via _runServerQuery() . This essentially runs the query (see [Scored Query Evaluation] for the runQuery loop) without any smoothing parameters. Through the process of running the query, counts are gathered for each scoring node in terms of the term frequency and context frequencies.

The flow then gathers the results in a loop and sums up the scores of the scored results to use as statistics for feeding back into the model. This is accomplished via the _copyStatistics() method which essentially takes all scoring nodes and sets the statistics of the number of occurrences and the context sizes for each node.

Next: [Application of Smoothing Parameters]