Hi,
I am trying to examine whether some new features that I've implemented are effective for retrieval. For this, I am comparing a standard retrieval algorithm vs. a learnt one that is using the new features (in addition to the score produced by the standard retrieval algorithm).
If I understand the process right, for the learnt approach I need to initially run a standard retrieval algorithm, save the returned documents, calculate the value of the new features I've designed for those documents, cross-reference those documents with the qrels of the dataset to produce the RankLib datafile and run some of the LETOR algorithms provided.
The problem with this process is that the learnt approach will be evaluated using a subset of the relevant documents that are available, since any relevant document that wasn't retrieved by the standard retrieval algorithm won't be used in the LETOR process.
Am I missing something? Is there a step to alleviate the problem? For precision-based metrics that may not be a problem but for MAP or recall-oriented ones, comparisons become inapplicable.
Any advice would be appreciated. Thanks for the help.
Kind regards,
George