Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex compounding or character encoding. Hunspell interfaces: Curses, Ispell compatible pipe interface, OpenOffice.org UNO module
Archive your personal history
ResCarta Toolkit offers an open source solution to creating, storing, viewing, and searching digital collections. Applications in the toolkit let users create and edit metadata, convert data to open standard ResCarta format, index and host collections.
The ht://Dig system is a complete indexing and searching system for a domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Google and AltaVista.
Contineo is a Web-based Document Management System (DMS). Features: Folder organization, document Versioning, Bulk import, import from mailbox. NOTE: this project has been DISMISSED in favor of LogicalDOC http://sourceforge.net/projects/logicaldoc
Hyper Estraier is a full-text search system. It works as with Google, but based on peer-to-peer architecture. Using Hyper Estraier, we can construct a large-scaled search engine with cheap computers.
Lucene has moved to Jakarta. Please visit http://lucene.apache.org/
iVia is an Internet subject portal or virtual library system. As a hybrid expert and machine built collection creation and management system, resources can be crawled and metadata and selected full-text can be automatically generated/extracted.
The BeeGram library is a portable open source search engine toolkit written in C. BeeGram provides a number of building blocks for the construction of powerful general-purpose text-based search tools.
XPath HTML parser
HXPath is a command line tool useful to extract data from HTML documents. HXPath can select sub trees, like the standard xpath tool, but is also able to read contents and attributes and output them in a bash friendly format. HTML Tidy and HTTP/HTTPS get are built in too.
Harvestman is a context aware metasearch engine which functions as a universal infromation gatherer and data mining system for the internet.
A web crawler which uses regular expressions on text downloaded from a site.
LAMP eGovernment Database Project offers state and local governments a free open source, web-enabled system for use in developing public information sites. You can also use this system for government-to-government systems as well.
LANbyrinth is a bot that indexes a LAN and organizes its files. It is initially focused on MP3 files indexing. Features: Fully configurable Fast and smart searching Recognizes duplicated files Organizes songs by artist/album etc.
Lazysearch is a quick & dirty search proxy script designed for use with Mac OS X.
MWIP (Mean What I Play) is a clone of PWIM written in Lua. It is faster than the original PWIM (which was in Python), and also contains extra features and better documentation. It is meant to be a complete replacement to PWIM.
The Medlane project is an attempt to create a set of tools that will enable librarians to move from the standard MARC (MAchine Readable Cataloging) format to a new library/museum XML format. This move will ensure traditional library/museum data remains
NetarchiveSuite is a web archiving system based around the Heritrix harvester. It features scheduling of mass harvests, automatic division of harvests into smaller jobs, a storage system with replication and a proxy-based viewer of archived material.
Online news and newspaper harvester - Like RSS Newsreader w/ database. National & International News. Very detailed catches hard to find news articles. Allows resposting of summaries w/ comments to Usenet Newsgroups, complex searches & more.
To help build the Next Generation Internet on the foundation of Open Data
Olenc is a Nutch-based crawler for Java, providing easy methods to index specific websites for further web search, via a community-driven portal.
PHP World Portal is being developed as the framework for JLS Web Development's site. After each module is completed it will be released as open source for the public. The core framework will be released by 1/23/04.
PHPLinkMonger is a small project to provide a modular system to maintain a database of links for personal use or integration within a larger website. It will offer compatibility with a variety of databases (including MySQL, PostgreSQL & Oracle).
The project provides an incubator for intelligent agent-assisted, AR gaming-oriented BI applications generated through the STALEMATE Knowledge-based System Design Environment (KBSDE), integrating Web-enabled knowledge bases, data mining and warehousing and directed at asset management and investment banking.
Sciense Searcher is a system that lets you search, organize and share bibliographic cites of research articles, books, booklets, collections, manuals, thesis, proceedings, technical reports, unpublished publications and misc.
(Project is participated in the Zend PHP5 Contest. Project information will be released after the event, Oct 11, 2004)