#60 page ranking through link structure analysis

open
nobody
None
5
2003-02-13
2003-02-13
Anonymous
No

ht://dig already includes a rudimentary page ranking
feature via link structure, the back link count.
However, this method is only the very crudest algorithm
for link-structure analysis-based page ranking amongst
a variety of such algorithms discussed in the literature.

Feature requests:

1. Produce a link graph representation during
collection. This should ideally have a form that
identifies for each document which other documents it
references, and (optionally) which ones it is
referenced from.

2. Provide hooks for algorithms to be run on this link
graph (as input) which produces global 'quality'
estimate weightings for all the documents in the
collection (as output). [This is similar to what
Google does.] Default setting: extract in- and
out-link counts.

3. Provide hooks to run the same type of algorithm on
the result set of a query within htsearch. [This is
similar to the Teoma search engine.] Default setting:
return out-link counts.

4. A small collection of good and/or fast link
structure analysis algorithms to hook into the above.

An example of what can be done with this approach, and
what kinds of algorithms for higher quality ranking are
available in the literature:
http://www.cs.cornell.edu/home/kleinber/auth.ps
or
http://www.almaden.ibm.com/cs/k53/clever.html

Andreas Strotmann
Universität zu Köln
Zentrum für Angewandte Informatik
Rechenzentrum

strotmann@rrz.uni-koeln.de

Discussion