webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy to master. The standard webStraktor output format is XML based, either in ASCII, UTF-8 or ISO-8859-1 (Latin1) code pages. webStraktor relies on the Apache HttpClient for retrieving content via the HTTP protocol. It adheres to the Robots Exclusion Protocol and it can be configured to operate in an anonymous way by connecting to the predominant types of web proxy servers. webStraktor extends the functionality of web crawlers, spiders or bots by integrating scraping and crawling capabilities.
This is a simple java web applcation search framework that intended to address common web search requirements, i.e providing search form against your application domain objects that are persisted in a backend RDBMS database, e.g. Oracle.
Musifire - это бесплатное радио с функцией поиска по похожим исполнителям, трекам, тегам и возможностью скачать понравившуюся музыку! СЛУШАЙ! К
Compact and fast open source desktop CMS (content management system) for generation of documentations and small or medium sized websites. Features: Full text search, integrated browser, multilingual projects, support extern HTML editors.
GameFAQs board and user indexer fetches pages from GameFAQs and stores board and user names in a database for browsing.
A web interface to use the protocol apturl. It allow applications that are searched and installed by a click through the browser with support for this protocol. There is a on-line preview in www.apturl.net.
FidoAccess is a .NET assemble that provides classes to access FIDO-based messagebases like Squish, JAM, MSG for read. SquishIndexer is a console-based utility to index all your FIDO messagebases with Google Desktop Software.
The search aggregator allows users to initiate searches across multiple applications and receive aggregated results. This project is based on Lucene, written in Java, exposes web and plugin interfaces, and supports the Open Search and Json standards.
Automatic Indexer. Easy To Use And Modify.
Easyspider is a webspider in perl that is capable of scanning websides and other documents(100000p/h).The spider parses htm,xml,rss,doc,ppt,rtf,xls and pdf files extracts the content and generates xml content. Main feature is a client/server architecture!
Job publish and search engine based on Java2EE, Hibernate, PostgreSQL and Jersey with Web interface based on JQuery
Light network file search engine, is a crawler of FTP servers and SMB shares (Windows shares and UNIX systems running Samba). WWW Perl(Mason) interface is provided for searching files.
phpopensearch makes it easy to offer a search plugin to the users of your site. Works with FF/IE7+/other browsers that support the opensearch XML format.
Search for MediaWiki based on Lucene. This is the daemon only, see mediawiki.org for the the PHP extension.
A 3d model search engine which provides an intuitive query interface for a user to search a query model in a large database of indexed 3d objects both accurately and efficiently using a novel similarity transformation invariant shape matching algorithm.
Data migration/conversion library based on STX and XSLT transformation
Infofuze is a Java library and server application that can be used to transform and combine data from various sources into a specific XML or other text output format that can be stored or indexed.
Written in python, Reverse Phone Lookup is a simple program that when given a phone number, will search the white pages and display the information returned (First Name, Last Name, Address, City, State, and Zip code). Note: This program no longer works.
The WhereIsNow Web Service Client Library project is a java library used to query the WhereIsNow webservices. You can freely embed it in your code to easily develop new clients and integrate the WhereIsNow features in your own applications.
DELETE ME ... See you on github somewhere
Digital Library Search Engine
SeerSuite is an application toolkit for digital libraries and search engines; i.e., CiteSeerX. CiteSeerX has moved to GitHub, please get the latest code from: https://github.com/SeerLabs/CiteSeerX
Jbox is a Java full-text search engine framework. It is not a complete application, but rather a code library and API that can easily be used for constructing a search engineer.
The goal of this project is to provide a reusable library to transform any web page or data to content objects by generic, configurable ContentProvider plugins for the iQser GIN Semantic Middleware (www.iqser.com).
A CGI search script able to build and use an index of words.
e-DSG Descoberta de Serviço Eletrônico Governamental
GUDDI é uma solução livre desenvolvida com o Framework Demoiselle que implementa o conceito de e-DSG (Descoberta de Serviço Eletrônico Governamental) e segue os padrões do e-PING para auxiliar Entidades Públicas a divulgarem seus serviços.
CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++.