Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#43 rewrite URLs in htsearch

open
nobody
htsearch (29)
5
2002-03-26
2002-03-26
George Moody
No

My htdig-indexed site is mirrored. So far, I've asked
the mirror maintainers to run rundig on each of their
mirrors nightly after retrieving the latest updates --
but some of the mirrors can take hours to execute
rundig. It would be trivial to have the mirrors pick
up the indices generated by rundig on the master site,
but then users of the mirrors would get search results
that point back to the master site (defeating the
purpose of the mirrors, to distribute the load and to
give better service to those with poor connections to
the master site).

So what would help is to be able to rewrite the URLs
retrieved by htsearch on-the-fly, replacing the
master hostname in each one with the mirror's hostname.
I'm thinking of doing this with sed (or similar)
postprocessing of the htsearch output, but there must
be a cleaner way to do this.

Here's another way to accomplish this end: when
generating indices, recognize and encode the current
hostname as <HOST> (or the distinctive token of your
choice); then allow a setting in htdig.conf to control
how <HOST> is expanded by htsearch. Not as general as
a full-fledged URL rewriter, but perhaps easier to
implement.

Discussion

  • Logged In: NO

    There is a field like "URL_common_strings", which are replaced
    by short tokens in the search database.

    If the config file used during the dig specifies your.host.edu as
    on of these, and the config file used during searching replaces
    this with other.host.com then all URLs which referred to
    your.host.edu will automatically be returned as if they had
    contained other.host.com

    I hope this helps