An object relational-mapping (ORM) library for Java
Hibernate is an Object/Relational Mapper tool. It's very popular among Java applications and implements the Java Persistence API. Hibernate ORM enables developers to more easily write applications whose data outlives the application process. As an Object/Relational Mapping (ORM) framework, Hibernate is concerned with data persistence as it applies to relational databases (via JDBC).
CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++.
The ht://Dig system is a complete indexing and searching system for a domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Google and AltaVista.
SENTENSA Knowledge Miner is a platform independent tool for searching any text. SENTENSA uses robust methods of indexing and searching text, leveraging on experience from more than 20 years of information retrieval.
JMdRdf is the tool which creates RDF/RSS. 1.You can generate RDF/RSS about your homepage from your HTML(s) without programming. JMdRdf extract Information such as title, description, etc automatically from HTML. 2.You can paste RDF/RSS into your HTML
Contineo is a Web-based Document Management System (DMS). Features: Folder organization, document Versioning, Bulk import, import from mailbox. NOTE: this project has been DISMISSED in favor of LogicalDOC http://sourceforge.net/projects/logicaldoc
WebSPHINX is a web crawler (robot, spider) Java class library, originally developed by Robert Miller of Carnegie Mellon University. Multithreaded, tollerant HTML parsing, URL filtering and page classification, pattern matching, mirroring, and more.
This project provides cross-forge semantic search for the Qualipso Forge. It integrates A4 AdvDoc prototype (semantic search GUI and engine) with A3 homogeneous and heterogeneous cross-forge semantic search capabilities. See Qualipso.org for details
Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
EasyGIS simplifies GIS data management, sharing, and publishing. REST interfaces (json, html views). Lucene based FTS searches. Thematic maps, business cartography. Integration with external GIS data providers - Google, OSM.
JavaMatch is an engine that can search inside a runtime Java data structures, and look for objects that best match the criteria that you specify. The extensive query mechanism allows for highly customizable tuning of your match queries.
XmlTvProducer for PHP is extendable engine to grab tv/radio listings from websites and produce XMLTV output. Data distribution for TV-Browser is included. Primary focus is on Slovak and Czech channels, but the development is open to anybody.
Python app used to download (torrent) files from various RSS feeds. Designed for use with Transmission client...
Easy Spider is a distributed Perl Web Crawler Project from 2006
Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider and Perl/PHP Backends: https://www.artikelschreiber.com/ https://github.com/thecerial/ https://www.buzzerstar.com/development/ Webcrawlers are mostly the first thing to start programming at if you start your programming career. It is fun to look at some code that is few years ago and to see how one has improved himself. (c) Sebastian Enger 2005-2015
Classfieds for cars with mootools, php and mysql, totally in ajax.
ASPSearch, is a search engine project writen in ASP that requires no instalation on the server. set up a search engine in less than five minuits
Bibliophile is a loose grouping of independent OS or GPL bibliographic systems and aims at promoting discussion, standards and the development of common utilities.
Book management system with webservice written in php
CatMDServices is a Web application for describing and searching web services by means of metadata. Developed by IAAA (Univ. of Zaragoza) and GeoSpatiumLab S.L., sponsored by IGN Spain. Technical details: Java, GWT, XML, multiplatform, multilingual.
DOSE: a distributed platform for semantic elaboration that provides semantic services such as automatic annotation of web resources at the document substructure level, semantic search facilities, semantic annotation storage and retrieval.
dCrawler (Distributed Crawler) alias D-HarvestMan (Distributed HarvestMan) is a distributed Web crawler implemented in the Python programming language. dCrawler is developed on top of the existing open source Web crawler named HarvestMan.
Search Engine that gives full control over the search result. The user can do searches by category, and then combine previous search results to build complex search results, without the need of an advances query language.
Event Driven Federated Search platform to aggregate search results from distributed content providers.
An extensible framework and user interface for combining various structured search and document clustering techniques.
A collection of Java Servlets relating to searching. Use of these servlets should make future transitions between search appliances less painful as well as simplify the query parameters.