
#259 Galago #reject operator failing

galago (57)

Galago's #reject operator appears to not be fully implemented. I believe it should provide functionality such as "show me all documents about java but don't include the term 'island'". This query would look something like: #combine(java #reject(island))

which throws this:

q=#combine(java #reject(island))
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(
at java.util.ArrayList.get(
at org.lemurproject.galago.core.retrieval.query.Node.getChild(
at org.lemurproject.galago.core.retrieval.traversal.ImplicitFeatureCastTraversal.isCountNode(
at org.lemurproject.galago.core.retrieval.traversal.ImplicitFeatureCastTraversal.addScorers(
at org.lemurproject.galago.core.retrieval.traversal.ImplicitFeatureCastTraversal.afterNode(

If you catch this exception in ImplicitFeatureCastTraversal.isCountNode() and return true (or false) you get the following error:

q=#combine(java #reject(island))
java.lang.IllegalArgumentException: No valid constructor for node #reject( #extents:island:part=postings() ).
Allowable Iterator constructors allow for leading optional Parameters, followed by optional NodeParameters, and finally the list of child iterators.FAILED AT: Argument 1 is:
Constructor expected:
at org.lemurproject.galago.core.retrieval.FeatureFactory.getIterator(
at org.lemurproject.galago.core.retrieval.LocalRetrieval.createNodeMergedIterator(
at org.lemurproject.galago.core.retrieval.LocalRetrieval.createIterator(
at org.lemurproject.galago.core.retrieval.LocalRetrieval.getNodeStatistics(

I believe that if this is fixed it may also fix the #require, #any, and #all operators also since they all use the IndicatorIterator.


  • John Foley

    John Foley - 2015-04-21
  • John Foley

    John Foley - 2015-04-21

    Did a boatload of work on this; #reject/#require still not patched, but:

    New boolean operators #band, #bor

    All count iterators are now indicator iterators; just added the interface rather than having another traversal insert a conversion iterator. The condition is indicator(c) is true iff count(c) > 0
    All iterators can now return a minimal set; since hasMatch takes a scoring context, it can compute score(c), count(c), or indicator(c) as necessary and actually remove results from the ranking. A FilterScore iterator can actually delete items, by returning hasMatch() false if the score is too low.

    I believe that an existing problem with #reject() is that it works on an already existing document set:

    #reject(condition original query)

    if you want it to work in isolation, i.e. #reject(query), you need to have some way of returning a "match" for all documents, and taking the complement of this set. I cheated and used the LengthsIterator for this in the new BooleanNotIterator.

    You should be able to run something like the following now:

    #reject(#bnot(#band(terms you hate)) query to score)
  • John Foley

    John Foley - 2015-04-21
    • assigned_to: John Foley
  • John Foley

    John Foley - 2015-06-16
    • status: open --> accepted

Log in to post a comment.