java crawler free download

13 projects for "java crawler" with 2 filters applied:

Web Scrapers BSD Clear Filters & Widen Search

Desktop and Mobile Device Management Software
It's a modern take on desktop management that can be scaled as per organizational needs.

Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.

Learn More
A complete payments platform, engineered for growth
Accept payments and move money globally with Stripe's powerful APIs and software solutions designed to help you capture more revenue.

Millions of companies of all sizes—from startups to Fortune 500s—use Stripe's software and APIs to accept payments, send payouts, and manage their businesses online.

Learn More
1

phoneutria

A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.

Downloads: 0 This Week

Last Update: 2017-05-22
See Project
2

Constellio Enterprise Search engine

Open source Search Engine and Enterprise Search

Constellio is an enterprise search engine that allows companies to search all their organization's information through a single interface (Web, CRM, ERP, ECM, Mail etc.). Constellio is Based on Apache Solr and Google Search Appliance's connector. Constellio has a powerful web crawler.

Downloads: 0 This Week

Last Update: 2015-03-31
See Project
3

Regular Expression web replication

Yet another web crawler? Yes, but this ones uses the full power of regular expressions to accept or reject, examine or ignore, save or refuse pages. You also use MIME types to do all this. Powerful and flexible.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
4

Heritrix: Internet Archive Web Crawler

The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.

21 Reviews

Downloads: 8 This Week

Last Update: 2013-06-05
See Project
Patch Management and Vulnerability Remediation Software | Action1
Enable IT security and operations teams to detect, prioritize, and remediate vulnerabilities to ensure continuous compliance – all while reducing cost

Action1 reinvents patching with an infinitely scalable, highly secure, cloud-native platform configurable in 5 minutes — it just works and is always free for the first 100 endpoints, with no functional limits. Featuring unified OS and third-party patching with peer-to-peer patch distribution and real-time vulnerability assessment with no VPN needed, it enables autonomous endpoint management that preempts ransomware and security risks, all while eliminating costly routine labor. Trusted by thousands of enterprises managing millions of endpoints globally, Action1 is certified for SOC 2 and ISO 27001.

Learn More
5

ItSucks

This project is a java web spider (web crawler) with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionalities are also available in a separate library.

3 Reviews

Downloads: 5 This Week

Last Update: 2013-04-29
See Project
6

DeDuplicator (Heritrix add-on)

The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.

Downloads: 0 This Week

Last Update: 2013-04-02
See Project
7

WebNews Crawler

WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
8

J-Obey (Robots.txt Crawler Module)

J-Obey is a Java Library/package, which allows people writing their own crawlers to have a stable Robots.txt parser, if you are writing a web crawler of some sort you can use J-Obey to take out the hassle of writing a Robots.txt parser/intrepreter.

Downloads: 0 This Week

Last Update: 2015-08-05
See Project
9

webloupe

WebLoupe is a java-based tool for analysis, interactive visualization (sitemap), and exploration of the information architecture and specific properties of local or publicly accessible websites. Based on web spider (or web crawler) technology.

Downloads: 0 This Week

Last Update: 2015-01-06
See Project
Run your private office with the ONLYOFFICE
Secure office and productivity apps

A Comprehensive Alternative to Office 365 for Business

Learn More
10

Arn0lD

A new Web Crawler including sophisticated searching process especialized by language !

Downloads: 0 This Week

Last Update: 2013-03-07
See Project
11

WebSPHINX

WebSPHINX is a web crawler (robot, spider) Java class library, originally developed by Robert Miller of Carnegie Mellon University. Multithreaded, tollerant HTML parsing, URL filtering and page classification, pattern matching, mirroring, and more.

2 Reviews

Downloads: 0 This Week

Last Update: 2015-11-12
See Project
12

Spider

Spider is web crawler written in the Java.Based on an Regular expression string the spider parses the internet for web pages matching this string and stores it in an MYSQL database.

Downloads: 0 This Week

Last Update: 2014-08-09
See Project
13

studiMaps

studiMaps is a web based application for visualization and analysis of social networks. It consists of two software components: a web-crawler for getting data and the web based application for visualization.

Downloads: 0 This Week

Last Update: 2014-08-03
See Project