This is a browser/navigator for the FedoraForum,which is based on wxPython. (http://forum.fedoraforum.org/)
Cortez, for create new news service model for RSS and blogging. Cortez will just offer the environment to create post, read news thru RSS(ATOM) and syndicate within the multiple blogs.
PandaList is a prefect directory listing script. Written in PHP and uses mysql PandaList is multilanguage and multitemplate. It includes SEO friendly URLs, google maps, WYSIWYG editor for descriptions, CMS editor from admin area.
bee-rain is a web crawler that harvest and index file over the network. You can see result by bee-rain website : http://bee-rain.internetcollaboratif.info/
Wordindex takes huge amounts of document type files (html, pdf, zip, text, ps, etc...) and full text indexes them. All data is stored in a MySQL database and a alpha web based search util has been written.
High-performance software for information retrieval research. Emphasis on semi-structured text retrieval, especially for HTML and XML. The goal is to facilitate information retrieval research by providing an interchangable toolkit of functions.
AVD is a continuation of the swim project. The goal is to create a suitable SQL server from swim's not-installed DB, and to maintain the swim client. AVD will be used as a gBootRoot method.
This was a terrible idea and is equally terribly implemented.
OpenAnonymity consists of a module for apache 2.0 Webserver and a framework that enables you to control search engine spider indexing on a word level, contrary to on file level as in Robots exclusion. OA could force Spiders to follow this rules.
A perl solution to display a nice directory listing if indexes are turned off on a *NIX based server. Nominally named index.cgi, it reads via 'ls -l' and parses the output as needed for display, using the default Apache icons, or others, if specified.
Bookmark-Manager is an advanced bookmark management utility for Windows supporting importing/exporting and merging of Internet Explorer favorites, Opera hotlists, Mozilla, Netscape, and Firefox bookmarks, XBEL, and HTML lists.
IGLU is a Java class library designed to facilitate sharing of code among Artificial Intelligence/Information Retrieval researchers to illustrate how various problems can be solved in Java. It is developed and maintained by the IGLU Research Group.
A search application to watch and download movies and TV shows
A federated search desktop application to read about, preview, watch, and download any movie and television titles that are being shared online.
Zoozle 2008 - 2010 Webpage, Tools and SQL Files
Download search engine and directory with Rapidshare and Torrent - zoozle Download Suchmaschine All The files that run the World Leading German Download Search Engine in 2010 with 500 000 unique visitors a day - all the tools you need to set up a clone. Source Code used and enhanced by: https://devop.tools/ https://github.com/thecerial/ https://blog.onetopp.com/ https://www.onetopp.com/ Code Contains: - PHP Files for zoozleNET, zoozleORG - Perl Crawler for gathering new content to database and all other cool tools i have created (c) Sebastian Enger 2005-2014
YAR - Yet Another Repository : collection of Perl modules to expose metadata using the OAI-PMH protocol. Includes XMLTape: creates OAI-PMH repositories on gzipped XML archives.
NWA Toolset, a software package for accessing archived web documents
Contineo is a Web-based Document Management System (DMS). Features: Folder organization, document Versioning, Bulk import, import from mailbox. NOTE: this project has been DISMISSED in favor of LogicalDOC http://sourceforge.net/projects/logicaldoc
Reads RSS feeds' full html page, scrapes and summarizes just the article content, stripped of ads, etc. Converts to speech (ogg/mp3) and creates a podcast of all of the summaries. Works with slashdot, weather, cnn, newsforge, groklaw, pirillo and more!
The LEADERS toolkit is a generic toolset that enables the creation of an online environment which integrates EAD finding aids and EAC authority records with TEI transcripts and digitised images of archival material suitable to a wide variety of archives.
Open Source Intelligence Automation.
SpiderFoot is an open source intelligence automation tool. Its goal is to automate the process of gathering intelligence about a given target, which may be an IP address, domain name, hostname or network subnet. SpiderFoot can be used offensively, i.e. as part of a black-box penetration test to gather information about the target or defensively to identify what information your organisation is freely providing for attackers to use against you.
IDV Directory Viewer is a web application written in PHP for navigating through directories and browsing files on an HTTP server. It is highly customizable and outputs standards-compliant XHTML 1.0. The appearance of listings is customizable with CSS.
"Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files" (http://swish-e.org/ ) This module provides a Python API for this software.
Referer spam (also known as log spam or referer bombing)
Required: - Php CLI - Php CURL Referer spam (also known as log spam or referer bombing) is a kind of spamdexing (spamming aimed at search engines). The technique involves making repeated web site requests using a fake referer URL that points to the site the spammer wishes to advertise. Sites that publicize their access logs, including referer statistics, will then inadvertently link back to the spammer's site. These links will be indexed by search engines as they crawl the access logs. This benefits the spammer because of the free link, which gives the spammer's site improved search engine ranking due to link-counting algorithms that search engines use.
A crawler which can get contents from web sites.
The crawler can crawl many types of web sites, including portals, digital newspapers, twitter-likes among others.