ht://dig already includes a rudimentary page ranking
feature via link structure, the back link count.
However, this method is only the very crudest algorithm
for link-structure analysis-based page ranking amongst
a variety of such algorithms discussed in the literature.
1. Produce a link graph representation during
collection. This should ideally have a form that
identifies for each document which other documents it
references, and (optionally) which ones it is
2. Provide hooks for algorithms to be run on this link
graph (as input) which produces global 'quality'
estimate weightings for all the documents in the
collection (as output). [This is similar to what
Google does.] Default setting: extract in- and
3. Provide hooks to run the same type of algorithm on
the result set of a query within htsearch. [This is
similar to the Teoma search engine.] Default setting:
return out-link counts.
4. A small collection of good and/or fast link
structure analysis algorithms to hook into the above.
An example of what can be done with this approach, and
what kinds of algorithms for higher quality ranking are
available in the literature:
Universität zu Köln
Zentrum für Angewandte Informatik