A drop-in framework for adding tagging (folksonomy) capabilities to existing applications
dCrawler (Distributed Crawler) alias D-HarvestMan (Distributed HarvestMan) is a distributed Web crawler implemented in the Python programming language. dCrawler is developed on top of the existing open source Web crawler named HarvestMan.
Webhunter is a distributed, multi-threaded web crawler designed for both general indexing and crawling the web for focused content.
Python Aggregator for information we are fond of.
Espigo is a web-based service that takes English input, semantically processes it and translates the phrase into Esperanto, the international language. Other features may include generation of parse trees and grammar-checking for Esperanto text.
Automated rewards and instrumentation system to recognize top online contributors.
Downloads and formats stories from your favorite Web-based, RSS- or Atom-syndicated news sources for display on your iPod. Provides an easy interface for creating and installing adapters for new news sources.
Fseek is a python-based web crawler. The user-interface is implemented using Django, the back-end uses pyCurl to fetch pages, and Pyro is used for IPC.
"Filtered communication" is the source code for a website which facilitates collaborative filtering of information on the internet. Users can create "filters", criteria which are defined in English. Activity mode (http://bayleshanks.com/pamv1): aslee
A project to develop specifications and software for a backwards-compatible gnutella protocol for real-time searches for anything on the internet, aka: 'The Universal Search Protocol' to join the family of established internet protocols
HORUS is a system for knowledge acquisition, hypothesis generation, inference and learning. It is an interactive, internet environment accessible to a diverse community of users (public-access or membership basis) - see also UMKAILASH project for more.
Network-centric Graphic/Photo database program with web-based interface, automatic thumbnail generation, flexible indexing, and automatic file updates.
A collection of software to implement search engine technology. The overall search technology is built on the individual components of this project, each component is released under the BSD License, and is written in the language most suited to its task.
JeeZez is a distributed information mangement/publishing framework. Currently the focus is to develop a module for software-management, but more modules will follow.
İstanbul otobüs saatlerini IETT sitesine girmeye gerek kalmadan bir grafik arayüz üzerinde gösteren program. Yakında sık kullanılan hatları tutma, en yakın otobüs saatini söyleme gibi özellikler eklenmesi hedeflenmekte.
Spider that recollects data from MySpace Social Network. At now, it is only designed to extract information from native american people because it is used for a social science study in the UNAM (Universidad Nacional Autónoma de México).
The Project is a link/image searcher and indexer in a html file.
MailCrawler is a piece of software intended to search automatically for email addresses on the internet. It is developed with the Python language.
A Webcrawler written in Python
This is a webcrawler, it can be used via CLI it gather informations with the urls scanned such as tags
A plugin for Google Desktop Search developed in python that indexes CAD files (ie. DWG) It's also an example of COM client and server.
Web based knowledge base system written in python with a MySQL backend. Allows search/edit/delete/update of articles,plus a directory and faq style layout of articles.
This will be a generic indexing system for the python language with pluggable engines to store the resulting indexes in either ZODB, MySQL or (at a later date) its own proprietary format.
Web spider and SERP scrapper
SemanticDoc is a documentation search engine that provides context specific listing of docbook xml books. Its goal is to provide accurate searches of web documentation that use semantic tags.
A News Aggregator - not a news reader - to collect news from subscribed RSS channels.