An open source search engine with RESTFul API and crawlers
OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.
OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!
The stuff here has no documentation and some of it may never be completed. This is my playground, use at your own risk.
TouchGraph provides a set of interfaces for graph visualization using force-based layout and focus+context techniques. For now only older code is available, but we are planning to release new versions as well.
The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.
Bookmark-Manager is an advanced bookmark management utility for Windows supporting importing/exporting and merging of Internet Explorer favorites, Opera hotlists, Mozilla, Netscape, and Firefox bookmarks, XBEL, and HTML lists.
PHP Crawler is a simple website search script for small-to-medium websites. The only requrements are PHP and MySQL, no shell access required.
A Java implementation of a flexible and extensible web spider engine. Optional modules allow functionality to be added (searching dead links, testing the performance and scalability of a site, creating a sitemap, etc ..
Web Search by the people, for the people
YaCy is a free search engine that anyone can use to build search the internet (www and ftp) or to create a search portal for others (internet or intranet). The scale of YaCy is limited only by the number of users and can index billions of web pages. In p2p mode it is fully decentralized, all users of the search engine network are equal and it is not possible for anyone to censor the content of the distributed index.
Media Cloud allows automated downloading and analysis of on-line media. It is intended to automate what was previously a tedious process of painstaking manual content analysis.
This was an UI course project. In this project we built an interface prototype of an online travel reservation system. This service was meant to revolutionize the travel idustry in several ways for occasional travelers as well as for large businesses.
NeuroGrid could be thought of as a "Napster for Bookmarks." It allows the user to store data in a web-like fashion, allowing you to associate bookmarks (files/documents/whatever) with multiple keywords. See http://www.neurogrid.net
A learner may use Picture Dictionary to search for pictures according to criteria including location, tags, label and licenses, also allowing a teacher to create an image database of images according to the needs of a project at hand.
A set of script that aims to help people to search for files over the internet.
This was a terrible idea and is equally terribly implemented.
This is the official collaborative development environment of the Large Knowledge Collider (LarKC), a platform for massive distributed reasoning that aims to remove the scalability barriers of currently existing reasoning systems for the Semantic Web
The Retrieval Component Integrator Project (RECOIN) intends to provide an extensible framework of Java classes to build a meta-search and information retrieval (IR) system based on heterogenous IR components as part of a modular retrieval process. The so
It is basicly a program that can make you a search engine. It is a web crawler, has all the web site source code (in ASP, soon to be PHP as well), and a mysql database.
Websitemirror is a small program to download complete websites into a specific directory for offline viewing. Websitemirror ist ein kleines Programm welches eine komplette Webseite in einer Verzeichnis für offline browsen herunterläd.
Bibliophile is a loose grouping of independent OS or GPL bibliographic systems and aims at promoting discussion, standards and the development of common utilities.
Unlock Google's potential. Use this application to find infomation that is more relevant to your search... This application allows enhanced searching on Google without the need for long modifiers etc..
IDRA (InDexing and Retrieving Automatically) is a tool which allows indexing a wide range of text (TXT, DOC, PDF) and image annotations files (XML), query-based searching, visualizing an index, saving it for re-usability, evaluation, etc.
XML bindings and a GUI for creating and editing XBMC Scrapers
This program is an editor for creating XBMC Scrapers. It is similar to ScraperEditor, an other editor using ScraperXML, that runs under .Net environment. This program runs under Sun/Oracle's Java Runtime. HELP WANTED! I am looking for someone, who would help me writing documentation, like user's manual and on-line help. Also if someone want to help, translated language files are always welcome...
WebCollector is an open source web crawler framework based on Java.
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java