Page 2 | java crawler free download

Media Crawler

The “Media Crawler” is an extensible Eclipse RCP based desktop application which will crawl a given file system, extract metadata from files, map metadata to internal schemas and store the metadata in a databse. This project is ANDS-funded.

1 Review

Downloads: 1 This Week

Last Update: 2014-04-21

See Project

Leopdo search engine

A web search engine and crawler written in java/mysql, fulltext and vertical search, word segmentation system .

1 Review

Downloads: 0 This Week

Last Update: 2014-07-03

See Project

EssentialScanner

RiverGlass EssentialScanner is an open source web and file system crawler which indexes the text content of discovered files so they can be retrieved and analyzed. It provides simple scanner capabilities as part of larger enterprise search solutions.

Downloads: 0 This Week

Last Update: 2015-04-24

See Project

Java web crawler

a minimal Java web crawler

Downloads: 0 This Week

Last Update: 2016-07-23

See Project

Ex-Crawler

Ex-Crawler is divided into 3 subprojects (Crawler Daemon, distributed gui Client, (web) search engine) which together provide a flexible and powerful search engine supporting distributed computing. More informations: http://ex-crawler.sourceforge.net

1 Review

Downloads: 0 This Week

Last Update: 2013-04-26

See Project

Folksonomy Web Crawler

A Web crawler prototype designed to index pages of certain resource sharing platforms based on folksonomy tags. The results are displayed in an Excel spreadsheet.

Downloads: 0 This Week

Last Update: 2015-02-08

See Project

ItSucks

This project is a java web spider (web crawler) with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionalities are also available in a separate library.

3 Reviews

Downloads: 6 This Week

Last Update: 2013-04-29

See Project

Project AWESOME

A school project consisting of a crawler, a server and a searchpage.

Downloads: 0 This Week

Last Update: 2013-05-16

See Project

MuSE-CIR

MuSE-CIR is a Multigram-based Search Engine and Collaborative Information Retrieval system. Written in Java /JSP, supports any JDBC connectable database - thoroughly tested only with OracleXE, and somewhat with MySQL, JSP on Apache Tomcat 5.5

Downloads: 0 This Week

Last Update: 2013-05-22

See Project

FaceBukkCraw

This is a simple webcrawler for FaceBook (TM) written in Java. The crawler will surf the public user pages (this means that you do not need to provide ann account) to reconstruct the friendship graph for further studies and analises

Downloads: 0 This Week

Last Update: 2013-04-18

See Project

nxs Crawler

nxs crawler is a program to crawl the internet. The program generates random ip numbers and attempts to connect to the hosts. If the host will answer, the result will be saved in a xml file. After than the crawler will disconnect... Additionally you can

Downloads: 0 This Week

Last Update: 2013-04-18

See Project

jSEO: Pluggable SEO for JEE

jSEO -- Pluggable SEO (Search Engine Optimization) for dynamic JEE web applications

1 Review

Downloads: 0 This Week

Last Update: 2014-03-04

See Project

Java Sitemap Parser

The Java Sitemap Parser can parse a website's Sitemap (http://www.sitemaps.org/). This is useful for web crawlers that want to discover URLs from a website that is using the Sitemap Protocol. This project has been incorporated into crawler-commons (https://github.com/crawler-commons/crawler-commons) and is no longer being maintained.

Downloads: 0 This Week

Last Update: 2016-02-11

See Project

Retriever: a light, extensible crawler

Retriever is a simple crawler packed as a Java library that allows developers to collect and manipulate documents reachable by a variety of protocols (e.g. http, smb). You'll easily crawl documents shared in a LAN, on the Web, and many other sources.

Downloads: 0 This Week

Last Update: 2013-04-23

See Project

DeDuplicator (Heritrix add-on)

The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.

Downloads: 0 This Week

Last Update: 2013-04-02

See Project

LogCrawler

LogCrawler is an ANT task for automatic testing of web applications. Using a HTTP crawler it visits all pages of a website and checks the server logfiles for errors. Use it as a "smoketest" with your CI system like CruiseControl.

Downloads: 0 This Week

Last Update: 2013-04-19

See Project

Course Crawler

Course Crawler is an application to compile term-definition pair from multiple web glossaries into a centralized, stable, and searchable location.

Downloads: 0 This Week

Last Update: 2013-03-11

See Project

WebNews Crawler

WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.

Downloads: 0 This Week

Last Update: 2013-04-23

See Project

Crawl-By-Example (Heritrix plugin)

Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.

Downloads: 0 This Week

Last Update: 2014-12-14

See Project

GronoSpy

GronoSpy is a WWW crawler which tries to extract knowledge based on the data from grono.net - a community portal.

Downloads: 0 This Week

Last Update: 2013-03-08

See Project

J-Obey (Robots.txt Crawler Module)

J-Obey is a Java Library/package, which allows people writing their own crawlers to have a stable Robots.txt parser, if you are writing a web crawler of some sort you can use J-Obey to take out the hassle of writing a Robots.txt parser/intrepreter.

Downloads: 0 This Week

Last Update: 2015-08-05

See Project

Crawler/Load Tester in Java

JCrawler is a perfect cralwing/load-testing tool which is cookie-enabled and follows human crawling pattern (hit/second).

Downloads: 2 This Week

Last Update: 2013-04-25

See Project

isobel

A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.

Downloads: 0 This Week

Last Update: 2013-03-22

See Project

SmartCrawler

SmartCrawler is a java-based fully configurable, multi-threaded and extensible crawler, which is able to fetch and analyze the contents of a web site by using dinamically pluggable filters

Downloads: 1 This Week

Last Update: 2013-03-22

See Project

webloupe

WebLoupe is a java-based tool for analysis, interactive visualization (sitemap), and exploration of the information architecture and specific properties of local or publicly accessible websites. Based on web spider (or web crawler) technology.

Downloads: 0 This Week

Last Update: 2015-01-06

See Project

Search Results for "java crawler" - Page 2

Showing 58 open source projects for "java crawler"

Media Crawler

Leopdo search engine

EssentialScanner

Java web crawler

Ex-Crawler

Folksonomy Web Crawler

ItSucks

Project AWESOME

MuSE-CIR

FaceBukkCraw

nxs Crawler

jSEO: Pluggable SEO for JEE

Java Sitemap Parser

Retriever: a light, extensible crawler

DeDuplicator (Heritrix add-on)

LogCrawler

Course Crawler

WebNews Crawler

Crawl-By-Example (Heritrix plugin)

GronoSpy

J-Obey (Robots.txt Crawler Module)

Crawler/Load Tester in Java

isobel

SmartCrawler

webloupe

Search Results for "java crawler" - Page 2

Showing 58 open source projects for "java crawler"

Media Crawler

Leopdo search engine

EssentialScanner

Java web crawler

Ex-Crawler

Folksonomy Web Crawler

ItSucks

Project AWESOME

MuSE-CIR

FaceBukkCraw

nxs Crawler

jSEO: Pluggable SEO for JEE

Java Sitemap Parser

Retriever: a light, extensible crawler

DeDuplicator (Heritrix add-on)

LogCrawler

Course Crawler

WebNews Crawler

Crawl-By-Example (Heritrix plugin)

GronoSpy

J-Obey (Robots.txt Crawler Module)

Crawler/Load Tester in Java

isobel

SmartCrawler

webloupe

Related Searches

Related Categories