The development of this project has ended. Please take a look to Constellio Enterprise Search. Constellio is based on Apache Solr, Apache Tika, and google search appliance connectors. http://www.constellio.com
WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
Pandora's Jar enables timeshifting on pandora-based (www.pandora.com) radio stations. Distributed voting system provides error correction and detection of damaged files. Playlist-per-station support lets you re-play by-genre in addition to by-time.
Photospace is an open platform for searching, viewing and annotating digital media in time and space. Photospace integrates easily with photo blogs, map blogs, SOAP clients, RSS and RDF readers and your own custom applications.
Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.
Project consist of 2 parts. One of them is a J2ME app. used to get information such as photo, position, speed & course from GPS and transfers it to the web server. Another one is a web app. which allows to manage and display received data using GoogleMap
JAMP provides several functions to index and manage your media files on resources like storage systems or dvds. The userinterface is webbased and fully written in java.
SCAM is a development environment for building metadata stores for RDF and the Semantic Web. SCAM is built upon international technology standards and metadata standards. Such as RDF, Dublin Core, IEEE/LOM and IMS.
A front end to the swedish public transportation search engine. It does the same requests as the wap page would but with the added funtions of a standard J2ME app.
Switchboard is a conceptual-level interface to many web and network related functions (SOAP, REST, XML parsing, screen-scraping, FTP, network sniffing), designed for the Processing environment.
VDC has been superseded by DVN: https://sourceforge.net/projects/dvn/ ---- The Virtual Data Center project is building an operational, open-source, digital library to enable the sharing of quantitative research data, and the development of distribute
list2db reads digested email files generated by the mailman mailing list software and converts them into SQL for a relational database. The project also includes a PHP frontend for users to search and browse archived list emails.
Analysis and interactive visualization of a web-based community. Supports different focuses on the given social network to present community groups to the user. Also specific information of each member is provided.
This project uses a combination of JSP tag, Factory pattern classes and XML to display directory structures. The directory will be specified in the JSP tag, then it will call the package to generate a XML document that describes the directory structure.
phpByteBazar is a web based, operating system independent file management and exchange application with multiple user support and comprehensive indexing and searching capabilities.
The goal of the project is to guide developers in designing Web applications which uses various Opensource frameworks such as spring and hibernate etc to build a scaleable, efficient and reliable Web application.
J-Obey is a Java Library/package, which allows people writing their own crawlers to have a stable Robots.txt parser, if you are writing a web crawler of some sort you can use J-Obey to take out the hassle of writing a Robots.txt parser/intrepreter.