Menu

#18 prefix/suffix lists and question-marks don't play nicely together

New
nobody
None
Medium
Defect
2012-03-07
2012-03-07
Anonymous
No

Originally created by: dgryski

One of the areas I got stuck on when debugging the trigram-question-mark issue, but might actually be a fundamental design limitation / feature, is that moving to prefix/suffix lists can cause the list of trigrams to drop considerably.

bash$ ./csearch -verbose 'foo_(bar)?zot' >/dev/null
2012/03/07 22:18:48 query: "foo" "oo_" "zot" ("_zo" "o_z")|("arz" "rzo")
2012/03/07 22:18:48 post query identified 0 possible files
bash$ ./csearch -verbose 'foo_(bar_)?zot' >/dev/null
2012/03/07 22:18:53 query: "foo" "oo_" "zot"
2012/03/07 22:18:53 post query identified 0 possible files

In the first case, "bar" is only three characters and stays as an exact trigram and is used to construct the arz/rzo entries.  When it becomes a prefix/suffix list (when it hits 4 characters by adding the underscore),  it no longer provides us with any trigram info because the empty string empties out the prefix and suffix lists as being "redundant" with the empty string.  ("" is a prefix of "ba").

I'm not sure if this is a bug or not.  I.e, _should_ we be able to transform prefix/suffix lists into AND/OR sets of trigrams in this case?

Discussion


Log in to post a comment.

MongoDB Logo MongoDB