An open source search engine with RESTFul API and crawlers
OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.
A Java implementation of a flexible and extensible web spider engine. Optional modules allow functionality to be added (searching dead links, testing the performance and scalability of a site, creating a sitemap, etc ..
The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.
TouchGraph provides a set of interfaces for graph visualization using force-based layout and focus+context techniques. For now only older code is available, but we are planning to release new versions as well.
Sharehound is a network file systems indexer and searcher written in Java. Currently supports SMB file shares (i.e. MS Windows-based shares) and FTP resources. Web UI is used for search and crawl monitoring. RSS feed is provided for search results.
This is the official collaborative development environment of the Large Knowledge Collider (LarKC), a platform for massive distributed reasoning that aims to remove the scalability barriers of currently existing reasoning systems for the Semantic Web
WebCollector is an open source web crawler framework based on Java.
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java
Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
PHP Band Manager has now moved all development to www.ooza.co.uk **Thanks**
SmartCrawler is a java-based fully configurable, multi-threaded and extensible crawler, which is able to fetch and analyze the contents of a web site by using dinamically pluggable filters
XQEngine is a Java component for searching collections of XML documents that uses an XQuery front end. The engine has a straightforward API that allows it to be easily embedded in end user applications. Requires some basic Java programming skills.
The complete suggestions framework for java, supporting single and multi field suggest, java suggest box, client/server with hessian or json-rpc, and GWT AJAX suggest box, phonetic plugins. Proven high performance for data sets > 1 Mio.
Ptarmigan is a SAX event generator that produces schema-conforming XML content from the metadata found in media files and streams. It supports MP3 ID3 (v1 & v2), Vorbis/Ogg, FLAC, WMA and playlists (M3U, PLS, ASX and B4S). Initial implementation in Java.
XML bindings and a GUI for creating and editing XBMC Scrapers
This program is an editor for creating XBMC Scrapers. It is similar to ScraperEditor, an other editor using ScraperXML, that runs under .Net environment. This program runs under Sun/Oracle's Java Runtime. HELP WANTED! I am looking for someone, who would help me writing documentation, like user's manual and on-line help. Also if someone want to help, translated language files are always welcome...
A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
jBingAPI is a java library to query the microsoft search engine bing (http://www.bing.com/) using their public api. jBingAPI just makes it a lot easier to communicate with this api.
nxs crawler is a program to crawl the internet. The program generates random ip numbers and attempts to connect to the hosts. If the host will answer, the result will be saved in a xml file. After than the crawler will disconnect... Additionally you can
An ALVIS J2EE-based search web front end for zebra indexing and retrieval server.
Robust featureful multi-threaded CLI web spider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.
Intelligent Search engine with hability of suggest the best solution to your problem or question. Based on concept of Web 3.0 semantic.
This application generates an index of a website using information stored in the pages' meta tags.
AIS - Associative Indexing Service, an application for storing bookmarks, memos, indexing of big (lifetime) archives for fast future access to the data by (personalized) keywords. In other words - it is an extension of human associative memory :)
BMW (Bags of Multiple Words) is a project based in Lucene 2.0. that try to work with the query-term dependency. BMW offer a simple method that can be applied to several standard ranking functions to exploit a simple type of term dependency.
BTPastry is a torrent search engine, base on pastry p2p network. It easy to use, search and share your torrent files. DEMO rendezvous peer is btpastry.getmyip.com:5050
The implementation of Bee Hive @ Work algorithm that simulates the foraging behavior of honey bees in nature. The aim is to provide an extensible framework that can be used by researchers to simply create new applications of this algorithm.