#119 exact match doesn't work

resolved
closed-fixed
htdig (103)
5
2002-05-10
2002-05-09
Anonymous
No

I am trying to index a large Java codebase (thousands
of files). In my htdig.conf I only provide weighting
for exact matches (since that is all I care about),
but searching for long strings provides LOTS of false
positives that contain substrings.

I would like to be able to do case exact EXACT matches
and repress everything else. However, this is
impossible.

Here is htdig.conf:

database_dir: /opt/htdig/db
start_url: http://my.site
exclude_urls: /cgi-bin/ .cgi
bad_extensions: .wav .gz .z .sit .au .zip .tar
.hqx .exe .com .gif \ .jpg .jpeg .aiff .map .ram .tgz .bin .rpm .mpg
.mov .avi .css
maintainer: my.email@company.com
max_head_length: 10000000
max_doc_size: 20000000
no_excerpt_show_top: true
search_algorithm: exact:1

Discussion

  • Gilles Detillieux

    Logged In: YES
    user_id=149687

    This isn't a bug, but a configuration problem, and as such belongs on the htdig-general mailing list. It's
    pretty hard to offer help to an anonymous bug report. If you see this, then have a look at
    http://www.htdig.org/attrs.html#maximum_word_length and increase the size of this attribute in
    htdig.conf to suit the maximum string length you want.

     
  • Gilles Detillieux

    • milestone: 102988 --> resolved
    • assigned_to: nobody --> grdetil
    • status: open --> closed-fixed
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks