Thread: RE: [htdig] Knowledge re-use suggestion

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Another way is to capture the user clicks to results page documents would be
to set up a  htdig web server  and to have all the links in the results
pages click thru  the htdig  web server onto the desired web page perhaps
thru a cgi-bin or a redirect.

How to affect the weight of a hit based on this information is rather
problematic because the weight to be ascribed to a particular url in the
database would be dependant on the search  ( boolean?  phrase ) string that
was used to produce the result.  A large! matrix.  Perhaps just the fact
that the url was clicked on would be enough information.
In this case a usage database would be updated.  Another difficulty is if
the site is reindexed, what happens to the url id's vs frequency of use .
Perhaps using the MD5 of the uls's  as a key?

-----Original Message-----
From: Kev Shepherd [mailto:K.S...@bo...]
Sent: Tuesday, January 28, 2003 11:20 PM
To: htd...@li...
Subject: [htdig] Knowledge re-use suggestion

Hi,

I'm new to the list (but not to HtDig) so forgive me if these has been
suggested or discussed previously, though I had a quick search through the
archives.  I have been thinking over how a search tool could adopt a
knowledge management approach, having just completed a thesis based on that
topic.

I have been running HtDig for years, and a number of times have thought
about harnessing what people are searching for as symptoms of desirable
knowledge in an organisation.  Looking through the logs, there seemed to be
valuable information buried away. 

I recently modified my HtDig configuration so that I could use PHP4 search
and results forms.  I began to think that now I could change the results
URLs into PHP links to capture the information into a log of what links were
followed.  The log would simply contain the URL, the keyword(s) used, and
todays date.

I believe it might be useful to merge information from "viewed URL" logs
back in to the HtDig database, to raise the priority of documents, depending
on whether they've been viewed previously.  Moreover, the raised priority
should be based on recency, as the value of information becomes dated.  For
example, documents viewed recently might be given a higher weighting, but
that weighting would diminish over (say) 100 days.

I know there will be users who click on all links until they find what they
wanted, but I suspect that on average, the weightings would lead to higher
initial win rates.

While I am competent at PHP, I don't think I could tackle a patch for HtDig
so I'll throw this idea open for discussion.

Regards, Kev.

-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
htdig-general mailing list <htd...@li...>
To unsubscribe, send a message to
<htd...@li...> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html