A HTML scraper that uses machine learning frameworks to extract labelled fields from raw HTML. The project also involves the development of a tool to display the semi structured data generated by the scraper component.
GameFAQs board and user indexer fetches pages from GameFAQs and stores board and user names in a database for browsing.
"Gobble" is a GUI based interface for accessing search results from www.Google.com and allowing the user to download files of a selected type. Functionality for multiple advanced functions is included.
Export google search result links to file.
Google Mass Search is a small script written in python to get large number(as you need) of urls from google search results of a specified string. It is really simple to use but fast & powerful. You can specify a search string, no. of results filename, and some optional fields. GMS retrieves all the required links in a few seconds and save it to the file. It also eliminates the redundant links. You can also apply filters like links containing a given string or not containing a string. If you know a bit of python programming, you can even customize GMS as you wish.
HORUS is a system for knowledge acquisition, hypothesis generation, inference and learning. It is an interactive, internet environment accessible to a diverse community of users (public-access or membership basis) - see also UMKAILASH project for more.
HyperSQL is like a doxygen plus javadoc for SQL, hypermapping SQL views, packages, procedures, and functions to HTML source code listings and showing all code locations where these are used.
Image2DocInfo has been made to quickly tag digital pictures. A GUI allows you to set attributes for an image, and then store them in XML files. Those files follow the Dublin Core naming scheme and are stored in the same directories than the pictures.
Network-centric Graphic/Photo database program with web-based interface, automatic thumbnail generation, flexible indexing, and automatic file updates.
High performance distributed in-memory key/value store
Infinispan is an open source, Java based data grid platform. ***IMPORTANT*** Starting with Infinispan 5.0.0.FINAL, Infinispan releases are no longer hosted in Sourceforge. They can now be located in www.jboss.org/infinispan/downloads
A collection of software to implement search engine technology. The overall search technology is built on the individual components of this project, each component is released under the BSD License, and is written in the language most suited to its task.
Bible study and Christian library management multilingual software.
Jake is a console based app written in python and qt4. Plugins will let you do almost anything, for example, search in google, translate, view images, talk with it (aka AI bot). Also, skining system will let you choose how should jake look.
İstanbul otobüs saatlerini IETT sitesine girmeye gerek kalmadan bir grafik arayüz üzerinde gösteren program. Yakında sık kullanılan hatları tutma, en yakın otobüs saatini söyleme gibi özellikler eklenmesi hedeflenmekte.
Spider that recollects data from MySpace Social Network. At now, it is only designed to extract information from native american people because it is used for a social science study in the UNAM (Universidad Nacional Autónoma de México).
The Project is a link/image searcher and indexer in a html file.
Lupy is a full text indexer for Python. It is a port of Jakarta Lucene 1.2 to Python. Specifically, it reads and writes indexes in Lucene binary format. Like Lucene, it is sophisticated and scalable.
MailCrawler is a piece of software intended to search automatically for email addresses on the internet. It is developed with the Python language.
Markov Search Buster is an open source anti-Markov Engine technology that combats spammy Google results. Markov Buster will interface with Google Ajax Search API to eliminate auto-generated pages from your search results.
MedusWiki is a Python Wiki engine intended to be used as a personal knowledge management system. It uses Topic Maps (XTM) to store metadata, meaningful associations could be created between wiki pages. Zope Page Templates (ZPT) are used to produce HTML.
The Meme Machine generates a visual and spoken narrative among connected nodes and the world wide web, in an expanding network using open source algorithms. The computers exhibit eerie self-deterministic behaviours.
Milim fetches the lyrics for your Hebrew songs from the web. The project features plugins for various media-players.
MindRetrieve is a personal search engine. It helps you organize and retrieve web pages you have visited. MindRetrieve is a lightweight, cross-platform, open source application available under the BSD license. It works with all popular web browsers.
A Webcrawler written in Python
This is a webcrawler, it can be used via CLI it gather informations with the urls scanned such as tags
The Minotaur Project is an anonymizing proxy for google searches. It distributes a search query among multiple bots, effectively hiding the search in the anonymity of the crowd.
The NLADA E-Library is an add-on product for the Zope web application server. It is designed to be a drop-in web application and content managment system for creating web based document libraries.