An automated client developed for downloading sequenced files.
BTG Web Search Engine is developed for students or developers who are interested in search engine technology. The engine web will be written in C++, Java, Perl,and Python. At the beginning of the project, a simple search engine design will be used.
This project aims at helping the process of searching and retrieval of citations and full-text scientific papers from the web for a specific user with specific research interests.
The target of this project is to develop a protocol and a server building on top of TCP/IP. I want to manage bookmarks over the network. The protocol will be based on XBel, an XML bookmark exchange language.
arachne is a C++ library for HTTP crawling, link, text and metadata extraction designed to run in a distributed environment.
I use the conventional way of architecture, C/S.This is different from the parker. When users input the key word then the client pass it to the server, the server return several results to the client, and the user could use the client to download.
Processes adopted to share academic knowledge are inefficient because they are publication-centered and lack a high-level knowledge definition. CPSKP offers a problem-centered framework for managing knowledge associated with Computational Problem Solving.
CitemaPP is a Google Sitemap generator written in C++. Instead of crawling the html-doc directory on your server, CitemaPP crawls the content of your server via http protocol.
CoverYourASP.com - complete Active Server Pages source (JScript) for this popular web site. Includes full membership system, diary, online db admin, banner ad system and loads more.
a DNA search engine based on statistical n-gram segment and lucene.net. a paper: http://arxiv.org/abs/1006.4114
DisSearch is a crawler of FTP servers with web interface provided for searching files.
Dissert is a program that collects and stores mailinglist postings and newsgroup articles and makes them browseable and searchable. It is a combination of old Deja and Geocrawler.
The DocConversion project provides a distributed document conversion solution with a well defined API which makes use of existing convstion tools and/or a centralized conversion server. This is part of the PRONIR research at http://www.pronir.nl
A system to retrieve and display in 3D the structure of the Internet (or as much as can be analysed). It should allow for an interesting perspective of the way pages are linked and clustered. It will hopefully also provide a more intuitive way of browsing
Fast SMB Search is a search engine for local SMB-based networks (e.g Windows networks). It's key feature is the ability to quickly search for a file in a large network. Also supports FTP search, so project name is not strict
FirteX is a high performance,full-featured text indexing and retrieval platform.It provides a flexible and feasible experiment platform for researchers,as well as a scalable platform for Web search development.It is very fast,and well support for Chi
A project to develop specifications and software for a backwards-compatible gnutella protocol for real-time searches for anything on the internet, aka: 'The Universal Search Protocol' to join the family of established internet protocols
GImageSpider is an Image Spider that has two abilities. GIS can search web by image search engines to find images. GIS can act as an image spider that crawls your arbitrary site by your constraints and find images.
Watch web pages for changes. Insert the URLs of the pages (or generic files) and the program will connect to all the sites and will report the ones that have been modified since the last time you checked them. You can make some actions on changed sites.
Alternative web server technology for publication, s and searching.
Harvestman is a context aware metasearch engine which functions as a universal infromation gatherer and data mining system for the internet.
This program aims search an index of the Computer science forum http://tud.hicknhack.org/forum/ , which is very useful and at the same time distracting for many students at the Dresden University of Technology.
HooDoo is designed to provide most of the same functionality of Google, but available to all for their websites
This project target to realize a system that discover the hotspot news, mainly Chinese news. Firstly, a web crawler fetch news from some URL seeds, then, the TDT module process the crawlered pages, and display the result in a friendly way.