java crawler free download

Showing 64 open source projects for "java crawler"

View related business solutions

Java Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
1

WebMagic

A scalable web crawler framework for Java

WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other...

Downloads: 2 This Week

Last Update: 2025-02-10
See Project
2

Heritrix

Internet Archive's open-source, web-scale, web crawler project

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives† and...

Downloads: 5 This Week

Last Update: 5 days ago
See Project
3

fess

Open source enterprise search server for websites, files, and data

Fess is an open source enterprise search server designed to provide powerful full-text search capabilities across multiple data sources. It enables organizations to quickly deploy a scalable search environment without requiring deep knowledge of underlying search technologies. Fess is built on top of OpenSearch and offers an integrated solution for crawling, indexing, and searching documents from websites, file systems, and various data stores. Fess includes a built-in crawler that can...

Downloads: 1 This Week

Last Update: 2026-06-25
See Project
4

Remixed Dungeon

Traditional roguelike game with pixel-art graphics

Remixed Dungeon is an open-source roguelike dungeon crawler for Android, derived from the classic Pixel Dungeon. It expands upon the original with new classes, items, levels, and mechanics, while remaining faithful to the permadeath and procedural generation hallmarks of the genre.

Downloads: 1 This Week

Last Update: 2026-06-06
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
5

Web Spider, Web Crawler, Email Extractor

Free Extracts Emails, Phones and custom text from Web using JAVA Regex

In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk

Downloads: 9 This Week

Last Update: 2025-11-23
See Project
6

WFDownloader App

Free batch downloader for image, wallpaper, video, audio, document,

...Note that this cross-platform version requires Java (minimum version Java 8) to be installed on your Operating System. For non-java required OS specific versions, check app's official website (https://www.wfdownloader.xyz).

3 Reviews

Downloads: 281 This Week

Last Update: 2026-06-07
See Project
7

Crawlab

Distributed web crawler admin platform for spiders management

Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes.

Downloads: 1 This Week

Last Update: 2023-07-26
See Project
8

ACHE Focused Crawler

ACHE is a web crawler for domain-specific search

ACHE is a focused web crawler. It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model....

Downloads: 0 This Week

Last Update: 2023-04-12
See Project
9

Web Spider, Web Crawler, Email Extractor

Free Extracts Emails, Phones and custom text from Web using JAVA Regex

In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk

3 Reviews

Downloads: 0 This Week

Last Update: 2022-12-24
See Project
Your monitoring isn't a stack. It's a pile. Fix that.
Errors, performance, logs, uptime. One install, one invoice, one UI.

Replace Datadog, New Relic, and Sentry without adding three more dashboards.

Free 30 days.
10

File System Crawler for Elasticsearch

Elasticsearch File System Crawler (FS Crawler)

This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.

Downloads: 0 This Week

Last Update: 2023-08-25
See Project
11

appcrawler

Automated mobile app crawler and testing tool built on Appium

AppCrawler is an automated mobile application testing tool designed to explore and interact with app user interfaces automatically. Built on top of the Appium automation framework, it systematically crawls through application screens and performs actions such as clicking buttons, navigating menus, and interacting with UI elements to simulate user behavior. It is commonly used for automated functional testing, UI exploration, and detecting crashes or unexpected behaviors in mobile...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
12

lxspider

Educational Python web scraping case collection for many sites

lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms,...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
13

crawler4j

Open source web crawler for Java

crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not.

Downloads: 0 This Week

Last Update: 2022-01-12
See Project
14

OpenSearchServer Search Engine

An open source search engine with RESTFul API and crawlers

OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...

31 Reviews

Downloads: 1 This Week

Last Update: 2018-08-26
See Project
15

Teachingbox

The Teachingbox uses advanced machine learning techniques to relieve developers from the programming of hand-crafted sophisticated behaviors of autonomous agents (such as robots, game players etc...) In the current status we have implemented a well founded reinforcement learning core in Java with many popular usecases, environments, policies and learners. Obtaining the teachingbox: FOR USERS: If you want to download the latest releases, please visit:...

Downloads: 1 This Week

Last Update: 2018-04-30
See Project
16

Gecco

Lightweight Java web crawler framework with jQuery-style extraction

Gecco is a lightweight web crawler framework written in Java that simplifies the process of building web scraping applications. It is designed to make crawler development straightforward by allowing developers to extract page elements using jQuery-style selectors rather than complex parsing logic. It integrates several well-known Java libraries and frameworks, including tools for HTTP requests, HTML parsing, JSON processing, and application development. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
17

YouSeer

YouSeer is an open source search engine framework, which was built on top of other open source components. It’s part of the general SeerSuite framework. YouSeer utilizes Hereitrix as a crawler and solr as an indexing system.

1 Review

Downloads: 0 This Week

Last Update: 2017-12-02
See Project
18

phoneutria

A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.

Downloads: 0 This Week

Last Update: 2017-05-22
See Project
19

sourcegreed

a java-based crawler

a java-based crawler

Downloads: 0 This Week

Last Update: 2016-07-27
See Project
20

Pathfinder Wiki-fr Crawler

Tous les sorts, les monstres, les dons et les objets magiques en VF

Toutes les infos viennent du http://www.pathfinder-fr.org/Wiki/Pathfinder-RPG.MainPage.ashx Le logiciel permet aussi la création de liste de sorts détaillé, d'exportation de de chaque type de données.

Downloads: 0 This Week

Last Update: 2016-01-02
See Project
21

WebCollector

WebCollector is an open source web crawler framework based on Java.

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java

Downloads: 0 This Week

Last Update: 2015-06-04
See Project
22

webStraktor

webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy...

Downloads: 1 This Week

Last Update: 2014-04-25
See Project
23

Battlelance

Multiplayer dungeon crawler

Battlelance is a turn based multiplayer RPG game based on DnD and the likes. You can control a wizard, fighter, thief, cleric or a whole party of these and pit them against a dungeon master or other characters/parties. You can also take up the role as dungeon master and spawn enemies, traps and do other nasty tricks to make life of the characters as difficult as possible.

Downloads: 0 This Week

Last Update: 2013-08-01
See Project
24

Constellio Enterprise Search engine

Open source Search Engine and Enterprise Search

Constellio is an enterprise search engine that allows companies to search all their organization's information through a single interface (Web, CRM, ERP, ECM, Mail etc.). Constellio is Based on Apache Solr and Google Search Appliance's connector. Constellio has a powerful web crawler.

Downloads: 0 This Week

Last Update: 2015-03-31
See Project
25

Regular Expression web replication

Yet another web crawler? Yes, but this ones uses the full power of regular expressions to accept or reject, examine or ignore, save or refuse pages. You also use MIME types to do all this. Powerful and flexible.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project