An object relational-mapping (ORM) library for Java
Hibernate is an Object/Relational Mapper tool. It's very popular among Java applications and implements the Java Persistence API. Hibernate ORM enables developers to more easily write applications whose data outlives the application process. As an Object/Relational Mapping (ORM) framework, Hibernate is concerned with data persistence as it applies to relational databases (via JDBC).
An open source search engine with RESTFul API and crawlers
OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.
The MangaStream Downloader is an open source application written in Java for managing and downloading manga from the site mangastream.com and mangafox.me. It is written under the GNU-GPL license and uses an open source HTML parser - TagSoup. Follow the project page on Facebook for updates: https://www.facebook.com/MangastreamDownloader
Archive your personal history
ResCarta Toolkit offers an open source solution to creating, storing, viewing, and searching digital collections. Applications in the toolkit let users create and edit metadata, convert data to open standard ResCarta format, index and host collections.
A search application to watch and download movies and TV shows
A federated search desktop application to read about, preview, watch, and download any movie and television titles that are being shared online.
TouchGraph provides a set of interfaces for graph visualization using force-based layout and focus+context techniques. For now only older code is available, but we are planning to release new versions as well.
The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.
DBPrism is a framework to generate dynamic XML from a database, it provides an high performance DBGenerator for Cocoon2. Also is a J2EE replacement for Oracle mod_plsql. This project also includes a Restlet-Oracle connector exam. and Lucene Domain In
cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
Free Extracts Emails, Phones and custom text from Web using JAVA Regex
In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/
A Java implementation of a flexible and extensible web spider engine. Optional modules allow functionality to be added (searching dead links, testing the performance and scalability of a site, creating a sitemap, etc ..
Free Extracts Emails, Phones and custom text from Web using JAVA Regex
In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/
OpenEphyra is an open framework for question answering (QA). It retrieves answers to natural language questions from the Web and other sources. Visit http://www.ephyra.info/ for more details and information on joining this open research initiative.
The Daisy Open Source CMS is an enterprise-grade, Java-based content and information management solution consisting of a standalone repository server (with an HTTP/XML interface) and a Wiki-like, WYSIWYG editing and publishing web application frontend.
Contineo is a Web-based Document Management System (DMS). Features: Folder organization, document Versioning, Bulk import, import from mailbox. NOTE: this project has been DISMISSED in favor of LogicalDOC http://sourceforge.net/projects/logicaldoc
Web Search by the people, for the people
YaCy is a free search engine that anyone can use to build search the internet (www and ftp) or to create a search portal for others (internet or intranet). The scale of YaCy is limited only by the number of users and can index billions of web pages. In p2p mode it is fully decentralized, all users of the search engine network are equal and it is not possible for anyone to censor the content of the distributed index.
Smart Cache Loader is a very configurable pure Java web grabber with special support for integration with Smart Cache proxy server. It can perform different loading operations based on URL mask, content-type, ...
Oxyus is an open source search engine written in 100% Java, aimed to provide a search button to your website in an easy way. Oxyus uses Apache Lucene for indexing, Quartz for scheduling and other interesting software products.
Classifier4J is a java library that provides an API for automatic classification of text. The default (and only current) implementation of this API is a Bayesian classifier. This library can be used for multiple purposes - as a spam filter or a blog cl
NeuroGrid could be thought of as a "Napster for Bookmarks." It allows the user to store data in a web-like fashion, allowing you to associate bookmarks (files/documents/whatever) with multiple keywords. See http://www.neurogrid.net
Hyper Estraier is a full-text search system. It works as with Google, but based on peer-to-peer architecture. Using Hyper Estraier, we can construct a large-scaled search engine with cheap computers.
LIMO stands for Lucene Index Monitor. It is a web application that gives basic information about indexes used by the Lucene search engine (http://lucene.apache.org). It allows you to browse and search the index, and reconstruct stored fields.
A tool which allows you to download all erotic images and videos hosted on popular image hosting sites tagged with a given tag without clicking you through the web interface. ,,java bayimg yourtag'' is enought.
The Retrieval Component Integrator Project (RECOIN) intends to provide an extensible framework of Java classes to build a meta-search and information retrieval (IR) system based on heterogenous IR components as part of a modular retrieval process. The so
Google() meets the Matrix. Red Piranha combines Lucene (Searching Ability), XML-RDF (ability to learn), Tomcat (for P2P Power) and Spring (Ease of use) to not only let you find anything, anywhere, but to actually understand what you are looking for.