JobHunter aims to automatically find job information from some big sites such as chinahr,51job,zhaopin and so on. JobHunter searches the email address of each job item and automatically sends a email of application text to it.
Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.
The Informa library provides a convenient Java API for handling news channels and metadata about them. Different syntax formats (RSS 0.91, 1.0, 2.0 and Atom 0.3, 1.0) for feeds are supported. Also support for channel information descriptions (OPML) avail
jMarks is a full-blown multi-user web-based bookmark solution, written in Java. jMarks allows people to mark their online bookmarks as public or private, and can track the last time each bookmarked site was updated.
Open source Web page segmentation tool.
That project aims at providing a clean API, and the corresponding C++ implementation, for parsing travel-focused requests (e.g., "washington dc beijing monday r/t +aa -ua 1 week 2 adults 1 dog").
a small collection of python 3000 scripts/modules used to automate searching craigslist.org cities and categories for interesting stuff; these scripts currently use html screen scraping, since craigslist currently has no api
Javacrim, multilingual tools and resources in java
Group file share with advanced text parsing capability for easy search
Originally created as a church resource sharing system, phpShare&Search allows users to create accounts, share documents, search documents, and like or report documents. phpShare&Search's power comes from its advanced document parser which extracts text from .PDF, .TXT, .DOC, and .DOCX files and its community features of liking resources and reporting them as inappropriate or SPAM. Users also subscribe to weekly updates of new content. User's may choose to download and host/install/configure/modify/manage this code themselves, or contract the code writer to do these functions for them. Contact me for a reasonable quote. eedrew <at> users <dot> sourceforge <dot> net To support future revisions and/or contribute based on the value you found from this code, checkout the External Link drop-down in the menu. Also, if you do not wish to create and maintain your own installation, email firstname.lastname@example.org for a quote on a turn key solution.
It is a system cache which works together to SQUID, caching and cataloging the files. It is the only system that makes cache of many kind of files, including videos from YouTube and Windows Update completely efficient.
1.ANts p2p II - mod of anonymous p2p filesharing client to make more user friendly. 2.kerjodando ipfilter.dat sharing using a central tracker to make small world net. 3.itsDargens.com web 2.0 Ruby on Rails content layer
Search though multiple (configurable) information sources at once: results are presented side-by-side. Intended to support all information sources that implement OpenSearch, and more.
advanced search for mediawiki
This extension for mediawiki is based on Woogle and during the last year we improved and enhanced this extension. The main differences between Woogle and apMWsearch are the following: * no remote server, search is restricted to the included wiki * more file types can be searched * possibility to use different engines for indexing files * more flexible configuration for embedding into mediawiki
WebCommic (newest picture) > PDF converter (history)
Geek & Poke Atom / nichtlustig.de > PDF converter (versioning style)
WebCollector is an open source web crawler framework based on Java.
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java
The jGroups Package is a Java package providing a few classes to abstract over the structure and contents of Yahoo! Groups message archives. Due to the state the package is currently in it is only available from its CVS repository.
A Simple contact management system, aka browser based rolodex. Includes mass mailing of html mime messages to contact lists, export to csv, Fulltext boolean searches, etc. Requires: PHP >= 4.3.1 (with register_globals OFF) & MySQL >= 4.0.1
Indexing solution for a large set of radiological reports and images, based on the Swish-e search engine.
A BitTorrent tracker with web front-end written in C# ASP.NET.
Syncato is a Weblog Web Services system built on top of Berkeley DB XML, Webware and Python. It has a number of unique features; XPath access to all content via URLs, XSL-T presentation and extremely flexible database structure.
The Cornell Web Lab Collaboration Server is a suite of tools and services for GUI-based extraction, analysis and sharing of archived web data. See http://weblab.infosci.cornell.edu/ and http://www.cs.cornell.edu/~weigel for details about the project.
phpByteBazar is a web based, operating system independent file management and exchange application with multiple user support and comprehensive indexing and searching capabilities.
XML abstraction interface for Lucene and reference implementation
myICS lets you know where your books, CDs, DVDs, software, etc., are. You can even add your own categories! Keep track of their location or who's got them at the moment, for example. Use it locally or online.
Swaggle is a easy, simple, and lightweight perl tool for making web image galleries