Forked from https://sf.net/p/fmd/
The Free Manga Downloader (FMD) is an open source application written in Object-Pascal for managing and downloading manga from various websites. This is a mirror of main repository on GitHub. For feedback/bug report visit https://github.com/riderkick/FMD
CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++.
An open source search engine with RESTFul API and crawlers
OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.
PHPCrawl is a high configurable webcrawler/webspider-library written in PHP. It supports filters, limiters, cookie-handling, robots.txt-handling, multiprocessing and much more.
The stuff here has no documentation and some of it may never be completed. This is my playground, use at your own risk.
The ht://Dig system is a complete indexing and searching system for a domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Google and AltaVista.
Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (file systems, web sites, mail boxes, ...) and the file formats (documents, images, ...) occurring in these systems.
Fusker is a tool to create entire image galleries from an single specially constructed URL.
Free Extracts Emails, Phones and custom text from Web using JAVA Regex
In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/
Imgur Gallery Downloader
Users can now search Imgur for any phrase and ImgurDL/Loadur will automatically search for matching images. ImgurDL/Loadur will download the images while displaying the progress to the user.
===NOTICE=== After releasing a few updates, but far less than we wanted, we’ve made the decision to stop the OptimizeGoogle Project. The reasons for the decision were that there were not enough people on the team to keep it going. Google is changing things every day and it has become more and more frustrating to look at all the functions go broke piece by piece. The code will remain GPL, perhaps another person or team is interested in picking this up. For now, thank you for all your patience, feedback and support. Description: OptimizeGoogle is a Firefox extension that enhances Google search results and other pages by adding extra information and removing unwanted information. Created to maintain and improve CustomizeGoogle which seems to have been abandoned.
A torrent search engine plugin for the Azureus/Vuze bittorrent platform.
OpenEphyra is an open framework for question answering (QA). It retrieves answers to natural language questions from the Web and other sources. Visit http://www.ephyra.info/ for more details and information on joining this open research initiative.
A php application that provides a web-based graphical interface similar to apache directory listing. Functions:copy, move, delete, rename files, etc. For more detail, please go to the official site.
So scr_ipfm is a script written in php, used to graphically show amount of data downloaded by users in local network. To do that, it uses logs generated by ipfm program (ipfm is available at the address: http://robert.cheramy.net/ipfm/).
pyTube is a python-based commandline YouTube search. One can search for videos and display them in their default web browser. Requires python 2.5 and gdata.
IRToolkit is an attempt to build and develop a generic search engine that integrates state-of-the-art Information Retrieval (IR) models. Furthermore, it offers a capability to compare the performance (in terms of precision, recall, index size, search response time and so on) between several open source IR applications. If you use the IRToolkit please cite the following work: https://sites.google.com/site/dinhbaduy/bibtex#Dinh-Phdthesis-2012
Oxyus is an open source search engine written in 100% Java, aimed to provide a search button to your website in an easy way. Oxyus uses Apache Lucene for indexing, Quartz for scheduling and other interesting software products.
Classifier4J is a java library that provides an API for automatic classification of text. The default (and only current) implementation of this API is a Bayesian classifier. This library can be used for multiple purposes - as a spam filter or a blog cl
With DoCASU, Alfresco users have a simplified and easy to use solution to access, search and manage documents. DoCASU is a Rich Internet Application (RIA) based on Alfresco Web Scripts and ExtJS. Find all details on: http://code.optaros.com/trac/docasu
The Netjuke is a Web-Based Audio Streaming Jukebox powered by PHP 4, a database and all the MP3, Ogg Vorbis and other format files that constitute your digital music collection. Supports images, language packs, multi-level security, random playlists, etc
This was a terrible idea and is equally terribly implemented.
A simple to set up web scraper written in Java. It uses modified regEx to quickly write complex patterns to parse data out of a website. It contains a GUI tool for testing your configuration scripts and is fully automated through the command line
CaC is a application to easily download and convert Videos from Videosites like YouTube, Google Video etc. It´s written in Lazarus / FreePascal and availible for Linux, Windows and Mac OS X Systems.
Download multiple job postings in XHTML for batch browsing. Can also be input into programs you write to screen, weight, sort, archive, analyse job requirements etc. Currently supports http://www.jobbank.gc.ca