Galago's #reject operator appears to not be fully implemented. I believe it should provide functionality such as "show me all documents about java but don't include the term 'island'". This query would look something like: #combine(java #reject(island))
which throws this:
q=#combine(java #reject(island))
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.lemurproject.galago.core.retrieval.query.Node.getChild(Node.java:164)
at org.lemurproject.galago.core.retrieval.traversal.ImplicitFeatureCastTraversal.isCountNode(ImplicitFeatureCastTraversal.java:144)
at org.lemurproject.galago.core.retrieval.traversal.ImplicitFeatureCastTraversal.addScorers(ImplicitFeatureCastTraversal.java:114)
at org.lemurproject.galago.core.retrieval.traversal.ImplicitFeatureCastTraversal.afterNode(ImplicitFeatureCastTraversal.java:91)
If you catch this exception in ImplicitFeatureCastTraversal.isCountNode() and return true (or false) you get the following error:
q=#combine(java #reject(island))
java.lang.IllegalArgumentException: No valid constructor for node #reject( #extents:island:part=postings() ).
Allowable Iterator constructors allow for leading optional Parameters, followed by optional NodeParameters, and finally the list of child iterators.FAILED AT: Argument 1 is:
org.lemurproject.galago.core.retrieval.iterator.disk.DiskExtentIterator
Constructor expected:
org.lemurproject.galago.core.retrieval.iterator.IndicatorIterator
at org.lemurproject.galago.core.retrieval.FeatureFactory.getIterator(FeatureFactory.java:328)
at org.lemurproject.galago.core.retrieval.LocalRetrieval.createNodeMergedIterator(LocalRetrieval.java:244)
at org.lemurproject.galago.core.retrieval.LocalRetrieval.createIterator(LocalRetrieval.java:213)
at org.lemurproject.galago.core.retrieval.LocalRetrieval.getNodeStatistics(LocalRetrieval.java:355)
I believe that if this is fixed it may also fix the #require, #any, and #all operators also since they all use the IndicatorIterator.
Did a boatload of work on this;
#reject
/#require
still not patched, but:New boolean operators
#band
,#bor
All count iterators are now indicator iterators; just added the interface rather than having another traversal insert a conversion iterator. The condition is
indicator(c) is true iff count(c) > 0
All iterators can now return a minimal set; since hasMatch takes a scoring context, it can compute score(c), count(c), or indicator(c) as necessary and actually remove results from the ranking. A FilterScore iterator can actually delete items, by returning hasMatch() false if the score is too low.
I believe that an existing problem with
#reject()
is that it works on an already existing document set:if you want it to work in isolation, i.e.
#reject(query)
, you need to have some way of returning a "match" for all documents, and taking the complement of this set. I cheated and used the LengthsIterator for this in the new BooleanNotIterator.You should be able to run something like the following now: