Understanding File Format at testing data

Search engine and data mining applications and ClueWeb datasets.

Brought to you by: cammiemw, david_fisher, gregorybrooks, jamiecallan, sm-harding

Understanding File Format at testing data

Forum: RankLib

Creator: Kartik

Created: 2016-05-28

Updated: 2016-05-28

Kartik - 2016-05-28

Hi

I have recently started using RankLib for my datasets and having some queries in data representation. My dataset is standard Document-category dataset, with training documents each containing some list of categories (not in any order) and testing documents with some potential categories (extracted from top k similar training documents). I want to rank these potential categories using Learning to Rank algorithms.
I was looking at the required format and had some doubts which I was not able to understand by reading documentations.

1) The target value in the file format represents relevance score of the document-category pair (by anology it will be query-document pair). But what would happen in the case when testing documents does not contain any target value but a potential list which we have to rank according to the features provided? Will putting same value at target across have any significance in that case?

2) The queries in training and testing will be different. Is their a way to give preferences to certain training examples for a given testing example. For example, for a given test query, results of certain related training queries will be given more significance.

Thank you

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.