It would be nice to have a feature that allows htdig to
crawl a portion of an html page but not index the
content of that particular section.
Most of my pages have a header (with links) and a left
navigation (with links). When performing a search from
my search page I often get excessive results because I
searched for a word that was a link in the left nav. So I
will get 5 or 6 pages that have actual content
containing the word I searched for and about 30 pages
that just have the word in the left navigation.
I tried using the noindex_start & noindex_end and
surrounded them around the leftnav section of each
page but now htdig will not crawl the links contained in
A function simliar to noindex_start and noindex_end but
allow htdig to crawl any links found between
noindex_start and noindex_end would be great. In other
words, this *new* function would crawl the content
between noindex tags but would not store the content
in the htdig db files.
Log in to post a comment.