OpenAnonymity consists of a module for apache 2.0 Webserver and a framework that enables you to control search engine spider indexing on a word level, contrary to on file level as in Robots exclusion. OA could force Spiders to follow this rules.
A new Web Crawler including sophisticated searching process especialized by language !
Web Crawler & indexer project, for university
"girtools" is an implementation of Grid Information Retrieval (GIR). GIR is an emerging open standard for IR on the grid designed to allow dynamic, secure creation and searching of distributed information systems.
This is a simple command line tool, which will solve the problem of full mailboxes with stuff you don't want to lose. It fetches all the mail from any POP3 mailbox account and generates a searchable HTML archive on your local harddrive. OS: Unix/Linux
Digital Comics ad Picture Viewer with Database Support. GIF, BMP,JPG, PNG, JPEG 2000 Support, Direct3D with Zoom and Pan. Zip, Rar and StuffitX formats, Database Utilities. XML Embedding support, Web Search on Comic Database Pages.
This project will implement DAV Searching & Locating (DASL), an application of HTTP/1.1 forming a lightweight search protocol to transport queries and result sets and allows clients to make use of server-side search facilities.
A project to develop specifications and software for a backwards-compatible gnutella protocol for real-time searches for anything on the internet, aka: 'The Universal Search Protocol' to join the family of established internet protocols
A collection of software to implement search engine technology. The overall search technology is built on the individual components of this project, each component is released under the BSD License, and is written in the language most suited to its task.
Lucino is a C-library with some support for reading and writing Apache Lucene-indices. It currently comes with PHP- and Python bindings.
Lyfind is a little suite of components for easily searching, modifying and storing song lyrics from a variety of sources (mainly lyrics web sites).
My Community Portal is a all in one internet portal that offers, forum, groups, chat, your own e-mail, search engine, internet directory, your own home page, poll's, dating services, buddy list, MP3 and file sharing, and many more.
Distributed search engine for the Internet and intranets. Parallel search in heterogeneous indexes, topic-oriented harvester, CORBA interface to legacy document databases. Document clustering with neural networks.
PHP Wrapper Class For ht://Dig is a class I developed while desperately searching for something with similar capabilites. This class is intended to be much more thorough allowing for easily changing headers, footers, and templates. htdig + PHP = htPHP
A PHP extension to Swish-e
This will be a generic indexing system for the python language with pluggable engines to store the resulting indexes in either ZODB, MySQL or (at a later date) its own proprietary format.
SYRAH si propone di far emergere e rappresentare i concetti espressi per mezzo di un linguaggio naturale. SYRAH aims to discover and represent concepts expressed in natural languages. NLP, lemma, lemmario, italiano, rete, semantica, clustering, semantic
The SiCrawler (or Sensitive Information Crawler) is a web crawler designed to extract user defined sensitive information from web sites. This could be credit card/social security numbers, or a host of other information defined by regexes and plug-ins.
The Somewhat Intelligent Proxy [SIP] is an effort at an open-source, natural language, web accessible instrument which utilizes Internet sources to return answers to your questions.
Sprawler is the first Open Source internet search engine software and service - built by the community, for the community. It will address the various reasons most search engines today still are far from being where they need to be.
Syndicateme.net ... Ajax Atom 1.0 Syndication Engine Tell your story ... Especially if you are a business along Queen St. in Toronto Canada or King Street Waterloo Canada. Syndication can be from a pop mailbox, and can use XInclude.
Yet Another Open Search Engine