Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex compounding or character encoding. Hunspell interfaces: Curses, Ispell compatible pipe interface, OpenOffice.org UNO module
Search engine and data mining applications and ClueWeb datasets.
The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++.
Open Source Intelligence Automation.
SpiderFoot is an open source intelligence automation tool. Its goal is to automate the process of gathering intelligence about a given target, which may be an IP address, domain name, hostname or network subnet. SpiderFoot can be used offensively, i.e. as part of a black-box penetration test to gather information about the target or defensively to identify what information your organisation is freely providing for attackers to use against you.
Virtuoso is a scalable cross-platform server that combines Relational, Graph, and Document Data Management with Web Application Server and Web Services Platform functionality.
Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services.
The stuff here has no documentation and some of it may never be completed. This is my playground, use at your own risk.
A search application to watch and download movies and TV shows
A federated search desktop application to read about, preview, watch, and download any movie and television titles that are being shared online.
Lurker is a mailing list archiver designed for capacity, speed, simplicity, and configurability in that order. Noteworthy features include: google-style searching on all fields, chronology preserving threads, multilingual, and attachment support.
KSearch website search engine, written in Perl, is fully customizable with unlimited page search. Can use DBM or flat-file database. Search results output produce XHTML 1.0 Strict doc types making HTML and CSS easily match your existing website.
Extension for OpenOffice suite, based on Desk.Now java client for WhereIsNow web service.
Web based RSS Search Engine that learns user preferences to return results. Demo available at http://ec2-50-16-215-243.compute-1.amazonaws.com/
Command line application written in Java useful for automation of downloading process and filtering contents of downloaded files. jDownloader uses simple script file to configure downloading and filtering processes.
Monitors webpages for changes and emails output with differences to subscribers. Permits user accounts and registration. PHP/MYSQL.
Switchboard is a conceptual-level interface to many web and network related functions (SOAP, REST, XML parsing, screen-scraping, FTP, network sniffing), designed for the Processing environment.
Auto Rescanning - Search Terms - Regularly Updated With New Features
========== NOTE: (AS OF 11/05/2015) 4chan html structure has changed, full images are downloaded as well as the thumbnail. Fix coming shortly (after my exams are over) to stop the thumbnails from downloading. ========== This is the first release of my 4chan image downloader. This downloader packs loads of great features such as the search ability. Check the features section and be sure to let me know if you want a feature added. Coming Soon: - Wiki, explaining in depth how to use it more quickly (although its already pretty simple to use) - Ability to download the whole thread, not just images - Better multithreading - Ability to use proxies - Sort images download from searches into folders - Keep original image names - More responsive gui Be sure to let me know if you want any other features.
Our aim is to enable Web applications to consume linked data from the Web. With SQUIN (Semantic Web Query Interface) we will provide a Web data query service as an addition to the LAMP technology stack. This service executes queries over the whole We
The CMS-Bandits is a set of php scripts, with online html editor, calendar, search engine, rss reader, revision log, personal nickpage, comment system, webcrawler and even more.
A network asset management written in PHP & MySQL. Maintains a list of servers that can be x-ref by multiple items. Features: locations,manufacturers,vendors(contact names & phone numbers), Device log ,List of network ports,Software manager,File manager
An PHP + MySQL script that attempts to recreate features similar to the likes of Mininova, SeedPeer, and TPB frontend. Features a new design, crawling ability, cURL for single torrent scrapes + mass scrapes, and much much more! Based on T-Xore 0.4, Ibitz
MAOS (Meta-Attribute Object Store) is a light-weight Java library / framework implementing simple Object persistence using search-engine technology
Larbin is a Web crawler intended to fetch a large number of Web pages, it should be able to fetch more than 100 millions pages on a standard PC with much u/d. This set of PHP and Perl scripts, called webtools4larbin, can handle the output of Larbin and p
Project consist of 2 parts. One of them is a J2ME app. used to get information such as photo, position, speed & course from GPS and transfers it to the web server. Another one is a web app. which allows to manage and display received data using GoogleMap