DocTaur is a Web-based searchable directory of reference manuals. You can freely download, install, and administrate it on your local Linux intranet server. It is powered by the ht://Dig search engine and contains reference manuals for developers.
It will show of the files that you have uploaded though it or by ftp to a certain directory. THIS SCRIPT MAY NOT BE LISTED ON ANY OTHER WEBSITE EXCEPT FOR CREAMERSREALM, SOURCEFORGE, HOTSCRIPTS. IF IT IS IT MUST BE REMOVED OR LEGAL ACTION WILL BE TAKEN.
webExtractor is a Java application that is used for extracting specific content from web based HTML, XML, CSV, and free form text. The extracted data can be used for data gathering and mining purposes.
High performance distributed in-memory key/value store
Infinispan is an open source, Java based data grid platform. ***IMPORTANT*** Starting with Infinispan 5.0.0.FINAL, Infinispan releases are no longer hosted in Sourceforge. They can now be located in www.jboss.org/infinispan/downloads
This project aims to create a searchable archive (for several OSes) for the popular webcomic College Roomies From Hell!!!, located at http://www.crfh.net. The final code can hopefully be modified to help other webcomics and similar projects.
DCTViewer is a robust web based solution, sponsored by Document Conversions Technology, http://docconversions.com , for digital document searching and viewing in an intranet enviroment. Features include document storage, indexing, searching and viewing,
The DocConversion project provides a distributed document conversion solution with a well defined API which makes use of existing convstion tools and/or a centralized conversion server. This is part of the PRONIR research at http://www.pronir.nl
Simple Porn Downloader is a tiny all Java based application that uses a list of keywords and starting urls to crawl webpages and branch out searching for specific media extensions which are downloaded and presented in an html page.
Sitemap Generator is a very simple PHP class which allows you to easily generate site map in standard format of different search engines like Google,Yahoo,msn etc.You can generate site map in both xml and text format.
A terminal-program for downloading torrents from PirateBay
Torrtux is a terminal-based program, written in perl for downloading torrents from The Pirate Bay. If you live in a country where tpb is blocked (UK, Fin, Be, etc.), you can set a proxy in the config file. With it you can get the magnet link of your torrent, copy it in the clipboard and open your torrent manager. All of that from your terminal ! It also allows you to get the details of your torrent, the author, the date, the type, the size, etc., just like being on the TPB site ! Moreover, it retrieves subs from www.opensubtitles.org. It retrieves informations in the source code of the TPB page and parses it with regexp and the library html-parser. In the config file ~/.torrtuxrc, you can chose your display, subs, comments preferences, your torrent-manager and a proxy if needed ! Thanks for reporting all bugs you find !
Grub is a distributed internet crawler/indexer designed to run on multi-platform systems, interfacing with a central server/database.
WACS is a tool for building Adult Web Sites; equally suitable for managing a private collection or building a commercial site. It has many powerful features including dynamic filtering, model catalogs, automatic download and powerful search engine.
Unlock Google's potential. Use this application to find infomation that is more relevant to your search... This application allows enhanced searching on Google without the need for long modifiers etc..
lease-parser is a simple daemon that records the lease state changes of an ISC DHCP server to a database for historical reference. The data can be searched via a web search form that is provided with the tool.
FTP crawler is designed to provide an easy web interface to searching files on the FTP and a crawler to index files on FTP servers.
IRToolkit is an attempt to build and develop a generic search engine that integrates state-of-the-art Information Retrieval (IR) models. Furthermore, it offers a capability to compare the performance (in terms of precision, recall, index size, search response time and so on) between several open source IR applications. If you use the IRToolkit please cite the following work: https://sites.google.com/site/dinhbaduy/bibtex#Dinh-Phdthesis-2012
Provides efficient, effective implementations of 32- and 64-bit hash functions based on Rabin fingerprints / irreducible polynomials, in Java. Also provides integration with java.security.MessageDigest API.
Course Crawler is an application to compile term-definition pair from multiple web glossaries into a centralized, stable, and searchable location.
The Rainbow project is an open source initiative to build a comprehensive content management system using Microsoft's ASP.NET and C# technologies. It has ASP.NET 1.1 and ASP.NET 2.0 code bases.
CaC is a application to easily download and convert Videos from Videosites like YouTube, Google Video etc. It´s written in Lazarus / FreePascal and availible for Linux, Windows and Mac OS X Systems.
The Semantic Web Peer - Allegra is an innovative library for Semantic Web applications development. It provides a framework for asynchronous network peer programming, a simple stack of Internet standards implementations, and two new network applications:
An Apache2 DSO module search engine based on the Swish-e C API returning results by replacing tags in a user supplied html template. Persons with Swish-e knowledge and ability to generate a Swish-e index file should find the searchm interface familiar.
Smart apache logfile sniffer to assign given IP's to names. It's a tool to check if special guests are visiting your webserver or Big Brother is watching you. Works with a CSV-list of IP's or IP-ranges.
A new Web Crawler including sophisticated searching process especialized by language !
A PHP library/framework for the development of websites. The main features are: database independence, template-driven content, theme-able content generation, integrated WML generation, user content management, Lucene server integration.