crawler free download

Showing 157 open source projects for "crawler"

View related business solutions

Mac Clear Filters & Widen Search

Our Free Plans just got better! | Auth0 by Okta
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.

Try free now
Bright Data - All in One Platform for Proxies and Web Scraping
Say goodbye to blocks, restrictions, and CAPTCHAs

Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.

Get Started
1

SiteOne Crawler

SiteOne Crawler is a website analyzer and exporter

SiteOne Crawler is a very useful and easy-to-use tool you'll ♥ as a Dev/DevOps, website owner or consultant. Works on all popular platforms - Windows, macOS, and Linux (x64 and arm64 too). It will crawl your entire website in depth, analyze and report problems, show useful statistics and reports, generate an offline version of the website, generate sitemaps, or send reports via email. Watch a detailed video with a sample report for Astro. build website. This crawler can be used as a command...

1 Review

Downloads: 5 This Week

Last Update: 2024-09-14
See Project
2

ACHE Focused Crawler

ACHE is a web crawler for domain-specific search

ACHE is a focused web crawler. It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model...

Downloads: 7 This Week

Last Update: 2023-04-12
See Project
3

EasySpider

A visual no-code/code-free web crawler/spider

A visual code-free/no-code web crawler/spider, supporting both Chinese and English.

Downloads: 9 This Week

Last Update: 2024-04-23
See Project
4

crawley

The unix-way web crawler

Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for...

Downloads: 42 This Week

Last Update: 2024-09-08
See Project
Cybersecurity Solutions to Protect, Detect and Respond Against Cyberattacks
Kroll's elite cyber risk experts deliver end-to-end cyber security services for organizations in a wide range of sectors, across the globe.

From system upgrades or a move to the cloud … to applications meant to improve the customer experience … and to integral third-party relationships, one misstep can cascade into IP theft, wire fraud, ransomware, data breaches and more; not to mention regulatory action, civil litigation and reputational damage. That’s why we’ve structured end-to-end solutions to manage the entire threat lifecycle.

Learn More
5

WebMagic

A scalable web crawler framework for Java

WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features...

Downloads: 2 This Week

Last Update: 2023-12-05
See Project
6

Crawlab

Distributed web crawler admin platform for spiders management

Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate...

Downloads: 7 This Week

Last Update: 2023-07-26
See Project
7

Goutte

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...

Downloads: 6 This Week

Last Update: 2023-04-01
See Project
8

SiteOne Crawler (desktop app)

A free, feature-rich web analyzer and exporter/cloner you will love!

A free in-depth website analyzer providing audits of security, performance, SEO, accessibility and other technical aspects. Available as a desktop application for Windows/macOS/Linux and as a CLI tool for advanced users and CI/CD processes. It also includes an offline web page exporter (website clone, mirror).

Downloads: 4 This Week

Last Update: 2024-10-02
See Project
9

Web Spider, Web Crawler, Email Extractor

Free Extracts Emails, Phones and custom text from Web using JAVA Regex

In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender...

Downloads: 94 This Week

Last Update: 2022-12-25
See Project
Business Continuity Solutions | ConnectWise BCDR
Build a foundation for data security and disaster recovery to fit your clients’ needs no matter the budget.

Whether natural disaster, cyberattack, or plain-old human error, data can disappear in the blink of an eye. ConnectWise BCDR (formerly Recover) delivers reliable and secure backup and disaster recovery backed by powerful automation and a 24/7 NOC to get your clients back to work in minutes, not days.

Learn More
10

Heritrix

Internet Archive's open-source, web-scale, web crawler project

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives...

Downloads: 3 This Week

Last Update: 2024-09-09
See Project
11

Gerapy

Distributed Crawler Management Framework Based on Scrapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t run...

Downloads: 1 This Week

Last Update: 2023-07-19
See Project
12

DNS Crawler

Take a domain list and fetch DNS query into tab-delimited file

Very basic utility to assess various parameter from a domain. Useful for small-ISP and individual who own multiple domain. Currently the following values are fetched: SOA, NS, MX, TXT/SPF and A record.

Downloads: 3 This Week

Last Update: 2023-08-04
See Project
13

Laravel Sitemap

Create and generate sitemaps with ease

... it in the callable you pass to hasCrawled. You can also instruct the underlying crawler to not crawl some pages by passing a callable to shouldCrawl. You can configure the crawler used by the sitemap generator. The sitemap generator can execute JavaScript on each page so it will discover links that are generated by your JS scripts. You can enable this feature by setting execute_javascript in the config file to true.

Downloads: 3 This Week

Last Update: 2024-05-21
See Project
14

Snap Lens Web Crawler

Crawl and download Snap Lenses from lens.snapchat.com with ease.

Crawl and download Snap Lenses from lens.snapchat.com with ease. This crawler is a dependency of Snap Camera Server https://snap-camera-server.sourceforge.io

Downloads: 1 This Week

Last Update: 2024-09-08
See Project
15

Nebula libp2p DHT

A libp2p DHT crawler, monitor, and measurement tool

A libp2p DHT crawler and monitor that tracks the liveness of peers. The crawler connects to DHT bootstrap peers and then recursively follows all entries in their k-buckets until all peers have been visited. The crawler supports the IPFS, Filecoin, Polkadot, Kusama, Rococo, Westend networks and more. The crawler can store its results as JSON documents or in a postgres database - the --dry-run flag prevents it from doing either. Nebula will print a summary of the crawl at the end instead. A crawl...

Downloads: 0 This Week

Last Update: 2024-05-10
See Project
16

crwlr

Library for Rapid (Web) Crawler and Scraper Development

This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler could...

Downloads: 0 This Week

Last Update: 1 day ago
See Project
17

Roach

The complete web scraping toolkit for PHP

Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP. Roach doesn’t depend on a specific framework. Instead, you can use the core package...

Downloads: 2 This Week

Last Update: 2024-04-04
See Project
18

miniblink49

Lighter, faster browser kernel of blink to integrate HTML UI in apps

... electron). Customize as you wish, simulate another browser environment. Perfect HTML5 support, friendly to various front-end libraries (support HTML5, and friendly to front framework). After turning off the cross-domain switch, you can use various cross-domain functions (support cross-domain). Headless mode, which greatly saves resources and is suitable for crawlers (headless mode, be suitable for Web Crawler).

Downloads: 2 This Week

Last Update: 2023-12-25
See Project
19

Proxy_Pool

Python crawler proxy IP pool (proxy pool)

The main function of the crawler agent IP pool project is to regularly collect free agents published on the Internet for verification and storage, and to regularly verify and store agents to ensure the availability of agents, and to provide API and CLI. At the same time, you can also expand the proxy source to increase the quality and quantity of the proxy pool IP.

Downloads: 0 This Week

Last Update: 2023-08-14
See Project
20

LambdaHack

Haskell game engine library for roguelike dungeon crawlers

Haskell game engine library for roguelike dungeon crawlers. LambdaHack is a Haskell game engine library for ASCII roguelike games of arbitrary theme, size and complexity, with optional tactical squad combat. It's packaged together with a sample dungeon crawler in a quirky fantasy setting. To use the engine, you need to specify the content to be procedurally generated. You declare what the game world is made of (entities, their relations, physics and lore) and the engine builds the world...

Downloads: 0 This Week

Last Update: 2023-10-29
See Project
21

DocSearch

The easiest way to add search to your documentation

Initially created to fulfill our own developers' needs, DocSearch quickly evolved into a successful community project. Over the years, we've explored new ways to address the complexities of search for the open-source community. DocSearch understands how the user input fits into the context of your project and instantly presents the most relevant content with fewer interactions than any other method. With a design very close to the native experience on mobile, we leverage users acquaintance...

Downloads: 0 This Week

Last Update: 2024-09-27
See Project
22

Easyspider - Distributed Web Crawler

Easy Spider is a distributed Perl Web Crawler Project from 2006

Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing Software: https://www.artikelschreiber.com/en/ https://www.unaique.net/en/ https://www.unaique.com/ https://www.artikelschreiber.com/marketing/ https://www.paraphrasingtool1.com...

1 Review

Downloads: 3 This Week

Last Update: 2023-06-24
See Project
23

AnimeGAN

A simple PyTorch Implementation of Generative Adversarial Networks

A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing. The images are generated from a DCGAN model trained on 143,000 anime character faces for 100 epochs. Manipulating latent codes enables the transition from images in the first row to the last row. The images are not clean, some outliers can be observed, which degrades the quality of the generated images. Anime-style images of 126 tags are collected from danbooru.donmai.us using the crawler tool...

Downloads: 3 This Week

Last Update: 2023-03-21
See Project
24

Web Spider, Web Crawler, Email Extractor

Free Extracts Emails, Phones and custom text from Web using JAVA Regex

In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export...

3 Reviews

Downloads: 2 This Week

Last Update: 2022-12-24
See Project
25

ngx_waf

Handy, High performance, ModSecurity compatible Nginx firewall module

Handy, High-performance Nginx firewall module. Such as black and white list of IPs or IP range, uri black and white list, and request body black list, etc. Directives and rules are easy to write and readable. The IP detection is a constant-time operation. Most of the remaining inspections use caching to improve performance. Compatible with ModSecurity's rules, you can use OWASP ModSecurity Core Rule Set. Supports verifying Google, Bing, Baidu and Yandex crawlers and allowing them...

Downloads: 1 This Week

Last Update: 2023-08-02
See Project