My boss and I have just released a few projects under open source licenses.
Scorpion implements a system for automatically classifying Web-accessible
text documents. Scorpion is intended for use by investigators who have a
machine-readable subject classification scheme or thesaurus and wish to
incorporate it into an automatic classification system.
Webutils provides Perl modules that allow one to harvest web pages and
extract metadata from them.
RDF Topicmaps is a proof-of-concept application intended to demonstrate
the benefits of using automatically generated subject indexes to enhance
discovery and navigation in a collection of web pages. Noun phrases are
extracted from web pages and organized into topic relation maps encoded
in RDF. A user interface for browsing and searching the topicmaps is also
included in the package.
Hope they're of interest to some of you.
Office of Research
OCLC Online Computer Library Center