#120 ranked search returns less results

closed-wont-fix
nobody
Indexer (49)
5
2011-11-25
2010-08-18
fRiSi
No

TextIndexNG3 does not return all search results in case the result is ranked.

Our customer observed weired search results (portal search returned different persons for a firstname than a custom searchform did)

By default, TextIndexNG3 is installed with ranking set to True so this potentially affects many users.

The problem is reproducable on one of my testing servers (Data.fs ~4GB, ~8700 records in the catalog) but not locally with a smaller database (likely to be dependent on the number of indexed objects and/or indexed words)

The SearchableText (TextIndexNG3 with default settings from TextIndexNG3/skins/textindexng3/txng_convert_indexes.py) does return the person

person.SearchableText()
'heil-clemens-christoph Heil, Heil, Clemens Christoph Hausmeister im LKH Bludenz '

>>> [brain.id for brain in cat(portal_type='KKVAddress', SearchableText='bludenz')]
[]
>>> [brain.id for brain in cat(portal_type='KKVAddress', SearchableText='heil')]
[]

Reindexing does not change anything.

I added additional Indexes with different settings:

SearchableTextFrequNoRank
storage txng.storages.term_frequencies
ranking: False

SearchableTextDefaultStorage
storage: txng.storages.default
ranking: False

Those both return the object correctly

>>> [brain.id for brain in cat(portal_type='KKVAddress', SearchableTextDefaultStorage='heil')]
['heil-clemens-christoph']
>>> [brain.id for brain in cat(portal_type='KKVAddress', SearchableTextFrequNoRank='heil')]
['heil-clemens-christoph']

turning off ranking makes our SearchableText return the object

>>> [brain.id for brain in cat(portal_type='KKVAddress', SearchableText={'query':'heil', 'ranking':False})]
['heil-clemens-christoph']

since this seems to have something to do with the nr of indexed objects the report #1373401 might have had something to do with this
https://sourceforge.net/tracker/?func=detail&aid=1373401&group_id=50052&atid=458418

i'll happily apply patches or try out some tipps on my server if you have any ideas how to fix that

regards, fRiSi

Discussion

  • fRiSi
    fRiSi
    2010-08-23

    this has to do with the ranking_maxhits settings of the index. by default the index only returns the 50 best ranked results in case ranking is set to True (which is the default when replacing the ZCTextIndex with txng3 via the control panel).

    >>> len([brain.id for brain in cat(SearchableText={'query':'heil'})])
    50

    When querying for a certain portal_type and SearchableText both indexes are searched and the result sets get intersected. With the default setting of 50 results for SearchableText this can eaily lead to empty result sets for combined searches.

    possible solutions / workarounds:

    a) change ranking settings within our query,

    >>> len([brain.id for brain in cat(SearchableText={'query':'heil','ranking_maxhits':100})])
    100

    b) change raning_maxhits to a more sane default value

    c) install textindex ng w/o ranking support in your addon package or

    d) (by default in Products.TextIndexNg
    imho the drawback of not getting any results for combined searches (eg via http://plone.org/search_form\) outweights the benefit of returning best ranked results first

     
  • Andreas Jung
    Andreas Jung
    2011-11-25

    • status: open --> closed-wont-fix
     
  • Andreas Jung
    Andreas Jung
    2011-11-25

    Not sure if there is a real solution for this. Since the ranking implementation is expensive (and insanely implemented) there should be an explict low value as of 50...This issue can't be easily fixed.