#43 rewrite URLs in htsearch

htsearch (29)
George Moody

My htdig-indexed site is mirrored. So far, I've asked
the mirror maintainers to run rundig on each of their
mirrors nightly after retrieving the latest updates --
but some of the mirrors can take hours to execute
rundig. It would be trivial to have the mirrors pick
up the indices generated by rundig on the master site,
but then users of the mirrors would get search results
that point back to the master site (defeating the
purpose of the mirrors, to distribute the load and to
give better service to those with poor connections to
the master site).

So what would help is to be able to rewrite the URLs
retrieved by htsearch on-the-fly, replacing the
master hostname in each one with the mirror's hostname.
I'm thinking of doing this with sed (or similar)
postprocessing of the htsearch output, but there must
be a cleaner way to do this.

Here's another way to accomplish this end: when
generating indices, recognize and encode the current
hostname as <HOST> (or the distinctive token of your
choice); then allow a setting in htdig.conf to control
how <HOST> is expanded by htsearch. Not as general as
a full-fledged URL rewriter, but perhaps easier to


  • Logged In: NO

    There is a field like "URL_common_strings", which are replaced
    by short tokens in the search database.

    If the config file used during the dig specifies your.host.edu as
    on of these, and the config file used during searching replaces
    this with other.host.com then all URLs which referred to
    your.host.edu will automatically be returned as if they had
    contained other.host.com

    I hope this helps