Menu

#259 Galago #reject operator failing

v3.x
accepted
galago (57)
1
2015-06-16
2015-04-17
No

Galago's #reject operator appears to not be fully implemented. I believe it should provide functionality such as "show me all documents about java but don't include the term 'island'". This query would look something like: #combine(java #reject(island))

which throws this:


q=#combine(java #reject(island))
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.lemurproject.galago.core.retrieval.query.Node.getChild(Node.java:164)
at org.lemurproject.galago.core.retrieval.traversal.ImplicitFeatureCastTraversal.isCountNode(ImplicitFeatureCastTraversal.java:144)
at org.lemurproject.galago.core.retrieval.traversal.ImplicitFeatureCastTraversal.addScorers(ImplicitFeatureCastTraversal.java:114)
at org.lemurproject.galago.core.retrieval.traversal.ImplicitFeatureCastTraversal.afterNode(ImplicitFeatureCastTraversal.java:91)


If you catch this exception in ImplicitFeatureCastTraversal.isCountNode() and return true (or false) you get the following error:


q=#combine(java #reject(island))
java.lang.IllegalArgumentException: No valid constructor for node #reject( #extents:island:part=postings() ).
Allowable Iterator constructors allow for leading optional Parameters, followed by optional NodeParameters, and finally the list of child iterators.FAILED AT: Argument 1 is:
org.lemurproject.galago.core.retrieval.iterator.disk.DiskExtentIterator
Constructor expected:
org.lemurproject.galago.core.retrieval.iterator.IndicatorIterator
at org.lemurproject.galago.core.retrieval.FeatureFactory.getIterator(FeatureFactory.java:328)
at org.lemurproject.galago.core.retrieval.LocalRetrieval.createNodeMergedIterator(LocalRetrieval.java:244)
at org.lemurproject.galago.core.retrieval.LocalRetrieval.createIterator(LocalRetrieval.java:213)
at org.lemurproject.galago.core.retrieval.LocalRetrieval.getNodeStatistics(LocalRetrieval.java:355)


I believe that if this is fixed it may also fix the #require, #any, and #all operators also since they all use the IndicatorIterator.

Discussion

  • John Foley

    John Foley - 2015-04-21
     
  • John Foley

    John Foley - 2015-04-21

    Did a boatload of work on this; #reject/#require still not patched, but:

    New boolean operators #band, #bor

    All count iterators are now indicator iterators; just added the interface rather than having another traversal insert a conversion iterator. The condition is indicator(c) is true iff count(c) > 0
    All iterators can now return a minimal set; since hasMatch takes a scoring context, it can compute score(c), count(c), or indicator(c) as necessary and actually remove results from the ranking. A FilterScore iterator can actually delete items, by returning hasMatch() false if the score is too low.

    I believe that an existing problem with #reject() is that it works on an already existing document set:

    #reject(condition original query)
    

    if you want it to work in isolation, i.e. #reject(query), you need to have some way of returning a "match" for all documents, and taking the complement of this set. I cheated and used the LengthsIterator for this in the new BooleanNotIterator.

    You should be able to run something like the following now:

    #reject(#bnot(#band(terms you hate)) query to score)
    
     
  • John Foley

    John Foley - 2015-04-21
    • assigned_to: John Foley
     
  • John Foley

    John Foley - 2015-06-16
    • status: open --> accepted
     

Log in to post a comment.