An object relational-mapping (ORM) library for Java
Hibernate is an Object/Relational Mapper tool. It's very popular among Java applications and implements the Java Persistence API. Hibernate ORM enables developers to more easily write applications whose data outlives the application process. As an Object/Relational Mapping (ORM) framework, Hibernate is concerned with data persistence as it applies to relational databases (via JDBC).
CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++.
An open source search engine with RESTFul API and crawlers
OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.
PDFBox is a Java PDF Library. This project will allow access to all of the components in a PDF document. More PDF manipulation features will be added as the project matures. This ships with a utility to take a PDF document and output a text file.
PHPCrawl is a high configurable webcrawler/webspider-library written in PHP. It supports filters, limiters, cookie-handling, robots.txt-handling, multiprocessing and much more.
Xaraya is a web application framework and CMS written in PHP.
Xaraya is an Open Source web application framework and content management solution written in PHP and licensed under the GNU General Public License. Xaraya is extensible, uses robust permissions, and multilingual systems to dynamically manage content. Track and get the latest developments on GitHub: https://github.com/xaraya
Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (file systems, web sites, mail boxes, ...) and the file formats (documents, images, ...) occurring in these systems.
Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
AutoIndex is a PHP script that makes a table that lists the files in a directory, and lets users access the files and subdirectories. It includes searching, icons for each file type, an admin panel, uploads, access logging, file descriptions, and more.
A general purpose source code indexer and cross-referencer that provides web-based browsing of source code with links to the definition and usage of any identifier. Supports multiple languages. Up-to-date information in http://lxr.sourceforge.net
The MangaStream Downloader is an open source application written in Java for managing and downloading manga from the site mangastream.com and mangafox.me. It is written under the GNU-GPL license and uses an open source HTML parser - TagSoup. Follow the project page on Facebook for updates: https://www.facebook.com/MangastreamDownloader
This is an ***old archive*** of tools developed for facilitating the use of Creative Commons licenses and metadata. --- For the most up to date representation of any of the projects listed here, please see: http://creativecommons.org/project/Developer.
OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!
Interleave is a business process management application. It enables you to model your business process and make it available online. It's meant to replace processes which currently rely on paper or spreadsheets and it has a good workflow engine.
Contineo is a Web-based Document Management System (DMS). Features: Folder organization, document Versioning, Bulk import, import from mailbox. NOTE: this project has been DISMISSED in favor of LogicalDOC http://sourceforge.net/projects/logicaldoc
The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.
A search application to watch and download movies and TV shows
A federated search desktop application to read about, preview, watch, and download any movie and television titles that are being shared online.
The stuff here has no documentation and some of it may never be completed. This is my playground, use at your own risk.
The ht://Dig system is a complete indexing and searching system for a domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Google and AltaVista.
PHP Crawler is a simple website search script for small-to-medium websites. The only requrements are PHP and MySQL, no shell access required.
Download code, Learn code, Share code. The Ongoing Object-oriented Perl Project. scripts, tutorials, modules, case studies, anything OOPerl --not excluding contributions of OO discipline in other programming languages.
Fusker is a tool to create entire image galleries from an single specially constructed URL.
cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
A simple yet very powerful and fast PHP website search engine. TSEP is built to index your site so it can be searched later within seconds. Features: stopwords, logging, MySQL boolean search, localized, CSS formatting. See www.tsep.info for more info
A Python wrapper for the Google web API. Allows you to do Google searches, retrieve pages from the Google cache, and ask Google for spelling suggestions.