The Lemur Project Wiki

Search engine and data mining applications and ClueWeb datasets.

Brought to you by: cammiemw, david_fisher, gregorybrooks, jamiecallan, sm-harding

Query Parsing

After the QueryParserWrapper object is initialized and set up, the QueryEnvironment can then call the parser's query() method. This uses the ANTLR-generated methods to create a parse tree, of which the root node is returned back to the QueryEnvironment.

Create Root Node / Combine Node

The main method in the ANTLR indrilang.g file is the query() method. The first thing this does is to create the root node, which is essentially a indri::lang::CombineNode.

For our example ("#combine( dog )"), to this node, the parser adds children for the root #combine and it's children (the "dog" portion of the #combine node).

Create Index Term

From the creation of the scoredExtentNode, a indri::lang::RawExtentNode is created as a child of the raw #combine node. The inner portion is processed as an unweightedList type, and finally, the inner term can be transformed into a indri::lang::ScoredExtentNode that gets interpreted as a raw term .

The raw term ("dog") is processed as an unqualifiedTerm with a type of rawText and a indri::lang::IndexTerm is created to hold the query term "dog".

Create Parse Tree

What does all that give us? From the above processes, a compact query tree is created that holds our parsed query. A representation of the final parsed query tree for our query "#combine( dog )" is below:

  Root Node (indri::lang::CombineNode)
  Root Node (indri::lang::CombineNode)
    \
    Raw Scoring Node (indri::lang::RawScorerNode)
      \
      Index Term ("dog" indri::lang::IndexTerm)

Next: [Extent Restriction]