jjfoley / Profile

User Activity

Posted a comment on discussion Retrieval on The Lemur Project
You need a phrase operator to get statistics of -- and then the text operators might need to be extents, e.g.,: Node n = new Node("od", 1); n.addChild(new Node("extents", "natural")); n.addChild(new Node("extents", "language")); In the StructuredQuery/Galago QL language, this would be: #od:1(#extents:natural() #extents:language()) Pseudocode basically, but the thing you want to count is an "ordered window" or phrase of the "positions" called "extents" of your words. The generic "text" operator sometimes...
6 years ago
Posted a comment on discussion RankLib on The Lemur Project
It's a redefinition of the metric under a single swap of ordering. This is used to accelerate some of the learning algorithms (to quickly compare the benefits of different rankings based on which documents they are able to swap). It is a difficult thing to implement and test but is critical to LambdaMART, I believe.
7 years ago
Posted a comment on discussion RankLib on The Lemur Project
Yep, that's what I get for answering without trying.
7 years ago
Posted a comment on discussion RankLib on The Lemur Project
Just by query type -- RankLib's loading is naive about that, it assumes adjacent lines with the same qid are the same query.
7 years ago
Posted a comment on discussion RankLib on The Lemur Project
Doesn't matter, because RankLib ignores it. I usually put a zero, e.g.,: qid:001 0 1:0.5 2:0.7 #docid
7 years ago
Posted a comment on discussion RankLib on The Lemur Project
Threshold candidates are within a feature: it's how many times a feature is allowed to be split -1 says that any difference in floating point values may be used - if you have less than 256 distinct values for a feature, -1 is equivalent.
7 years ago
Posted a comment on discussion RankLib on The Lemur Project
Correct. The key e.g., (#1A) is how RankLib stores the document names.
7 years ago
Posted a comment on discussion RankLib on The Lemur Project
qrel is in trec_eval format, e.g., from the answer below: https://stackoverflow.com/questions/4275825/how-to-evaluate-a-search-retrieval-engine-using-trec-eval
7 years ago

View All

Personal Data

Username:: jjfoley
Joined:: 2013-06-14 17:41:26

Projects

The Lemur Project Search engine and data mining applications and ClueWeb datasets. Last Updated: 2025-04-12

John Foley

User Activity

Personal Data

Projects

This is a list of open source software projects that John Foley is associated with:

Personal Tools