NetarchiveSuite is a web archiving system based around the Heritrix harvester. It features scheduling of mass harvests, automatic division of harvests into smaller jobs, a storage system with replication and a proxy-based viewer of archived material.
To help build the Next Generation Internet on the foundation of Open Data
This is PHP library for accessing OWL files. OWL is w3.org standard for storing semantic information.
Olenc is a Nutch-based crawler for Java, providing easy methods to index specific websites for further web search, via a community-driven portal.
Open Taxonomy - Web-based end-user navigation and editing of a tag-based taxonomy. Uses Ruby on Rails to support collaborative building of a web directory structure. Popular tags/categories can be "Elected" to become the official taxonomy.
That project aims at providing a clean API, and the corresponding C++ implementation, for parsing travel-focused requests (e.g., "washington dc beijing monday r/t +aa -ua 1 week 2 adults 1 dog").
A script for storing a database of URIs, much like Yahoo and Lycos sans spidering and web-crawling. URIs are added by the administrator or proposed by users for approval. Uses the Eclipse class library available at http://www.sunlight.tmfweb.nl/eclipse/
OpenMKS is a search & navigational tool for large multimedia collections. With pluggable functionality and a core subsystem supporting the z39.50 ZING Community SRW search & retrieval specification, it can be run either as a Servlet or as a Web Service.
PHP World Portal is being developed as the framework for JLS Web Development's site. After each module is completed it will be released as open source for the public. The core framework will be released by 1/23/04.
A simple php solution that can automatically generate your website xml sitemap periodically from your database, it can compress your sitemaps and ping search engines after generation. It can be used with many popular RDBMS like mysql postgresql mssql etc
PHPLinkMonger is a small project to provide a modular system to maintain a database of links for personal use or integration within a larger website. It will offer compatibility with a variety of databases (including MySQL, PostgreSQL & Oracle).
Polish Flexion Engine provides ready-to-use polish flexion dictionary with flexion engine for full flexion text search easily integrated in portals, web search,database searching engines. First aim is polish flexion (pl. polska fleksja).Demo on Home Page
A fat client price checking tool. Similar in spirit to pricerunner and others except it checks prices at the source on demand. Supposed to save entering the same search criteria on multiple sites and then tabbing through to do a comparison.
Pyndex is a simple and fast full-text indexer and Bayesian classifier implemented in Python. It uses Metakit as its storage back-end. It works well for quickly adding search to an application, and is also well suited to in-memory indexing and search.
This project provides cross-forge semantic search for the Qualipso Forge. It integrates A4 AdvDoc prototype (semantic search GUI and engine) with A3 homogeneous and heterogeneous cross-forge semantic search capabilities. See Qualipso.org for details
Light network file search engine, is a crawler of FTP servers and SMB shares (Windows shares and UNIX systems running Samba). WWW Perl(Mason) interface is provided for searching files.
RSSearcher is a Java webapp built primarily on the Informa RSS library. It allows users to register queries against RSS channels and receive notifications when matches occur.
Relational storage for tagged documents
Restad is an indexing-querying tool for tagged documents. It uses a relational database for storage and querying. See the last news on the blog : https://sourceforge.net/p/restad/blog/ The Ruby first prototype can be found there : https://github.com/ymoreau/Restad
A fast way to rate the reading challenging level of book or text. Unlike well known reading metrics such as Fog, Kincaid, SMOG, ARI, Flesch, and Coleman-Liau readability this metric takes into account far more factors and is standarized against a corpus
The project provides an incubator for intelligent agent-assisted, AR gaming-oriented BI applications generated through the STALEMATE Knowledge-based System Design Environment (KBSDE), integrating Web-enabled knowledge bases, data mining and warehousing and directed at asset management and investment banking.
ssearchmodel - is the Unix name for the Scalable Search Model it is originally a text based information retrieval system. It is written in PHP and uses MySQL. The open source community can add to it and allow it to grow into more.
Scidoc N6 is a platform for collaborative research and learning based on Semantic Web standards like RDF and OWL. It supports dynamically evolving conecpt models as a foundation for various kinds of information systems.
Sciense Searcher is a system that lets you search, organize and share bibliographic cites of research articles, books, booklets, collections, manuals, thesis, proceedings, technical reports, unpublished publications and misc.
SNT is a search engine for SMB and FTP shares with crawler running on Win32. Web interface is provided for searching files and browsing shares contents. Also provided shared films list with users rates and comments.
(Project is participated in the Zend PHP5 Contest. Project information will be released after the event, Oct 11, 2004)