Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Galago Advanced Retrieval Configuration

Mostafa Keikha

Galago Advanced Retrieval Configuration

Galago enables user to control almost all aspects of the system through setting corresponding parameters. Non-developers users can easily choose among existing functionalities to change the behavior of the system. Developers can further modify or implement their own classes and easily plug them into the system. In this document, we describe the most important classes that control the retrieval process. We explain how to set those classes using the parameters and how to implement new classes and integrate them into the system.

Processing Models

Processing models define the overall behavior of the retrieval process. Any processing model should extend the org.lemurproject.galago.core.retrieval.processing.ProcessingModel interface that defines how to precess a query through the execute function. There are different models already implemented in galago that can be set using processingModel parameter. The value of this parameter indicated the class that will be used for retrieval. Following example defines a passage retrieval model to be used for executing the query:

{
"casefold" : true,
"requested" : 10,
"processingModel":"org.lemurproject.galago.core.retrieval.processing.RankedPassageModel",
"passageQuery":true,
"passageSize": 50,
"passageShift" : 25,
"queries" : [
{
"number" : " 301",
"text" : " international organized crime"
}
]
}

The most important existing processing models that are the following:

  • RankedDocumentModel : Performs straightforward document-at-a-time (daat) processing model.
  • RankedPassageModel : Performs passage-level retrieval scoring.
  • MaxScoreDocumentModel : Assumes the use of delta functions for scoring, then prunes using Maxscore that speeds up the processing time.
  • TwoPassDocumentPassageModel : Performs two stage retrieval using document-level retrieval as the first stage and passage-level retrieval as the second stage.
  • WorkingSetDocumentModel : Performs document retrieval over a given set of documents as working set.

In case the implemented processing models do not provide functionality that one might need, he needs to extend the ProcessingModel interface and implement the execute function. Integrating the newly implemented model would be as simple as setting the processingModel parameter to the name of the class.

It's worth mentioning that processing models can be defined at the query-level. In the following example, we use passage retrieval for the first query and document retrieval for the second query. This would enable developers to implement different retrieval models and selectively use them based on the query properties.

{
"casefold" : true,
"requested" : 10,
"queries" : [
{
"number" : " 301",
"text" : " international organized crime",
"processingModel":"org.lemurproject.galago.core.retrieval.processing.RankedPassageModel",
"passageSize": 50,
"passageShift" : 25,
"passageQuery":true
}
,
{
"number" : " 302",
"text" : " poliomyelitis and post polio",
"processingModel":"org.lemurproject.galago.core.retrieval.processing.RankedDocumentModel"
}
]
}

This document is not complete and will be updated with more detail very soon.


Related

Wiki: Galago
Wiki: Home