An open source search engine with RESTFul API and crawlers
OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.
OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!
The stuff here has no documentation and some of it may never be completed. This is my playground, use at your own risk.
Simple yet feature-rich Document Management System
XODA is a KISSed (Keep Simple and Stupid) System for Organizing Documents using AJAX. This is a Document Management System without backend database, though making possible organizing files/directories by descriptions, filters and more. Visit xoda.org
A Java implementation of a flexible and extensible web spider engine. Optional modules allow functionality to be added (searching dead links, testing the performance and scalability of a site, creating a sitemap, etc ..
Bookmark-Manager is an advanced bookmark management utility for Windows supporting importing/exporting and merging of Internet Explorer favorites, Opera hotlists, Mozilla, Netscape, and Firefox bookmarks, XBEL, and HTML lists.
TouchGraph provides a set of interfaces for graph visualization using force-based layout and focus+context techniques. For now only older code is available, but we are planning to release new versions as well.
The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.
PHP Crawler is a simple website search script for small-to-medium websites. The only requrements are PHP and MySQL, no shell access required.
This is the official collaborative development environment of the Large Knowledge Collider (LarKC), a platform for massive distributed reasoning that aims to remove the scalability barriers of currently existing reasoning systems for the Semantic Web
Sitemap Generator is a very simple PHP class which allows you to easily generate site map in standard format of different search engines like Google,Yahoo,msn etc.You can generate site map in both xml and text format.
Media Cloud allows automated downloading and analysis of on-line media. It is intended to automate what was previously a tedious process of painstaking manual content analysis.
Auto Rescanning - Search Terms - Regularly Updated With New Features
========== NOTE: (AS OF 11/05/2015) 4chan html structure has changed, full images are downloaded as well as the thumbnail. Fix coming shortly (after my exams are over) to stop the thumbnails from downloading. ========== This is the first release of my 4chan image downloader. This downloader packs loads of great features such as the search ability. Check the features section and be sure to let me know if you want a feature added. Coming Soon: - Wiki, explaining in depth how to use it more quickly (although its already pretty simple to use) - Ability to download the whole thread, not just images - Better multithreading - Ability to use proxies - Sort images download from searches into folders - Keep original image names - More responsive gui Be sure to let me know if you want any other features.
Unlock Google's potential. Use this application to find infomation that is more relevant to your search... This application allows enhanced searching on Google without the need for long modifiers etc..
A set of script that aims to help people to search for files over the internet.
Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
Websitemirror is a small program to download complete websites into a specific directory for offline viewing. Websitemirror ist ein kleines Programm welches eine komplette Webseite in einer Verzeichnis für offline browsen herunterläd.
Seeks is a free and open technical design and application for enabling social websearch. Its specific purpose is to regroup users whose queries are similar so they can share both the query results and their experience on these results.
Torrential is a PHP BitTorrent tracker based on TorrentBits. Currently unstable. DO NOT USE.
DocInfoRetriever is a Web_based document full-text search engine based on lucene. It allows you to search the contents and metadata of documents . Supported document formats, likes doc, xls, pdf, odt, jpg...etc.,and torrent files.
My Community Portal is a all in one internet portal that offers, forum, groups, chat, your own e-mail, search engine, internet directory, your own home page, poll's, dating services, buddy list, MP3 and file sharing, and many more.
WAP-based search engine written in Perl. With this script you can add a search engine to your wap site
WebCollector is an open source web crawler framework based on Java.
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java
A browser for quick search movie and actors details on IMDB. Suports multi tab browsing. Easy to use, fast (multithreaded), permits work offline. Uses IMDBServices library. Based on .NET Framework 2.0 and the target is Avalon. Visual, Easy, Fast, Useful!
PornSeer, a smart porn detector, precisely locates breasts, vulvas and other pornographic features in images/videos. It generates mosaic patterns on illicit contents of porn images/video, provides indexes of pornographic contents for image/video database