An object relational-mapping (ORM) library for Java
Hibernate is an Object/Relational Mapper tool. It's very popular among Java applications and implements the Java Persistence API. Hibernate ORM enables developers to more easily write applications whose data outlives the application process. As an Object/Relational Mapping (ORM) framework, Hibernate is concerned with data persistence as it applies to relational databases (via JDBC).
CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++.
Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex compounding or character encoding. Hunspell interfaces: Curses, Ispell compatible pipe interface, OpenOffice.org UNO module
Archive your personal history
ResCarta Toolkit offers an open source solution to creating, storing, viewing, and searching digital collections. Applications in the toolkit let users create and edit metadata, convert data to open standard ResCarta format, index and host collections.
Bibliophile is a loose grouping of independent OS or GPL bibliographic systems and aims at promoting discussion, standards and the development of common utilities.
TEK empowers low-connectivity communities by providing a full Internet experience using email as the transport mechanism.
WebSPHINX is a web crawler (robot, spider) Java class library, originally developed by Robert Miller of Carnegie Mellon University. Multithreaded, tollerant HTML parsing, URL filtering and page classification, pattern matching, mirroring, and more.
The ht://Dig system is a complete indexing and searching system for a domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Google and AltaVista.
A Java implementation of a flexible and extensible web spider engine. Optional modules allow functionality to be added (searching dead links, testing the performance and scalability of a site, creating a sitemap, etc ..
High performance distributed in-memory key/value store
Infinispan is an open source, Java based data grid platform. ***IMPORTANT*** Starting with Infinispan 5.0.0.FINAL, Infinispan releases are no longer hosted in Sourceforge. They can now be located in www.jboss.org/infinispan/downloads
SNT is a search engine for SMB and FTP shares with crawler running on Win32. Web interface is provided for searching files and browsing shares contents. Also provided shared films list with users rates and comments.
Contineo is a Web-based Document Management System (DMS). Features: Folder organization, document Versioning, Bulk import, import from mailbox. NOTE: this project has been DISMISSED in favor of LogicalDOC http://sourceforge.net/projects/logicaldoc
Provide a robust and efficient implementation of n-gram based classifiers to Java. N-Gram algorithms have shown to be surprisingly good at tasks like guessing the language/encoding from an arbitrary text file. And there are many more applications.
Law Office is a "Concordance" killer. Concordance is client software designed for legal staff. With it, you can upload transcripts from court hearings and depositions, perform full-text searches, enter notes on transcripts for a certain line of text, arch
Semantic Web server for searching in annotated repositories of visual resources. Includes all the software underlying the application that won the International Semantic Web Challenge 2006. Provide by the e-culture.multimedian.nl project
The complete suggestions framework for java, supporting single and multi field suggest, java suggest box, client/server with hessian or json-rpc, and GWT AJAX suggest box, phonetic plugins. Proven high performance for data sets > 1 Mio.
Easy Spider is a distributed Perl Web Crawler Project from 2006
Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider and Perl/PHP Backends: https://devop.tools/ https://github.com/thecerial/ https://blog.onetopp.com/ https://www.onetopp.com/ Webcrawlers are mostly the first thing to start programming at if you start your programming career. It is fun to look at some code that is few years ago and to see how one has improved himself. (c) Sebastian Enger 2005-2015
Job publish and search engine based on Java2EE, Hibernate, PostgreSQL and Jersey with Web interface based on JQuery
A web crawler which uses regular expressions on text downloaded from a site.
That project aims at providing a clean API, and the corresponding C++ implementation, for parsing travel-focused requests (e.g., "washington dc beijing monday r/t +aa -ua 1 week 2 adults 1 dog").
A Java library for complying with the standard Web Robot Exclusion Protocol, robots.txt.
FlexibleShare has FlexSpaces Alfresco doc mgt, workflow and search in pods with a dashboard style UI with added Flex UI pods (wiki, blog, discussions, calendar, doc lib pods) for Alfresco Share back-end. Based on FlexibleDashboard, supporting plug-able pod modules for BI/charting/reporting, etc. AIR version with desktop file drag/drop, in browser version, and Mobile (Android and iOS) version. Downloads and source now only at http://code.google.com/p/flexibleshare/ Developed by Integrated Semantics: http://integratedsemantics.com blog: http://integratedsemantics.org
HttpFinder is web content searching tool. It enables look for text content that matches given regular expression in html pages/scripts etc. All navigation is performed with use of other regexp which describes links to visit.