crawling free download

Showing 19 open source projects for "crawling"

View related business solutions

Software Development Mac Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Context for your AI agents
Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.

Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.

Try for free
1

X-Crawl

Flexible Node.js AI-assisted crawler library

A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.

Downloads: 0 This Week

Last Update: 2025-04-06
See Project
2

Spatie Crawler

An easy to use, powerful crawler implemented in PHP

Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.

Downloads: 0 This Week

Last Update: 2025-11-26
See Project
3

Scrapy

A fast, high-level web crawling and web scraping framework

Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications.

Downloads: 18 This Week

Last Update: 2026-01-12
See Project
4

Douyin TikTok Download API

Douyin TikTok Download API

...Fast, asynchronous, free, open source, ad-free, long-term maintenance. This project is based on PyWebIO , FastAPI , HTTPX , a fast and asynchronous Douyin / TikTok data crawling tool, and realizes online batch parsing and downloading of watermark-free videos or atlases through the web, data crawling API, and iOS shortcut instructions for watermark-free download and other functions. You can deploy or transform this project yourself to achieve more functions, or you can directly call scraper.py in your project or install an existing pip package as a parsing library to easily crawl data, etc. ...

Downloads: 11 This Week

Last Update: 2025-03-16
See Project
Defend your online revenue
Remove counterfeits, stop piracy, and monitor sellers automatically to safeguard your brand's future

Red Points is the world’s first brand intelligence platform. By combining online brand protection, copyright enforcement, and distributor monitoring capabilities, Red Points gives you full visibility into brands’ presence online. Over 650 brands rely on Red Points to fight fakes, piracy and distribution abuse online, allowing them to maintain control, improve their brand value, and increase revenues.

Learn More
5

Python-Spider

Python3 web crawler practice

Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe...

Downloads: 1 This Week

Last Update: 2025-12-08
See Project
6

Laravel Sitemap

Create and generate sitemaps with ease

This package can generate a sitemap without you having to add urls to it manually. This works by crawling your entire site. The generator has the ability to execute JavaScript on each page so links injected into the dom by JavaScript will be crawled as well. The easiest way is to crawl the given domain and generate a sitemap with all found links. The destination of the sitemap should be specified by $path. If you don't want a crawled link to appear in the sitemap, just don't return it in the callable you pass to hasCrawled. ...

Downloads: 0 This Week

Last Update: 2025-11-25
See Project
7

SiteOne Crawler (desktop app)

A free, feature-rich web analyzer and exporter/cloner you will love!

A free in-depth website analyzer providing audits of security, performance, SEO, accessibility and other technical aspects. Available as a desktop application for Windows/macOS/Linux and as a CLI tool for advanced users and CI/CD processes. It also includes an offline web page exporter (website clone, mirror).

Downloads: 14 This Week

Last Update: 2024-10-02
See Project
8

Gerapy

Distributed Crawler Management Framework Based on Scrapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t run the crawler on our own machine, a good one. The method is to deploy Scrapy to a remote server for execution. ...

Downloads: 0 This Week

Last Update: 2023-07-19
See Project
9

Goutte

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method.

Downloads: 5 This Week

Last Update: 2023-04-01
See Project
Introducing The All-in-One Healthcare Compliance Software
The Complete Paperless Solution for Compliance

MedTrainer is for healthcare organizations of any size to guarantee compliance insurance and risk mitigation. Doctors, Administrators, and Nurses from large cities to rural areas use MedTrainer healthcare compliance platform for their team to work cohesively and efficiently within the workplace. Everything is under one roof with MedTrainer's all-in-one system for learning, compliance, credentialing, and accreditation.

Learn More
10

File System Crawler for Elasticsearch

Elasticsearch File System Crawler (FS Crawler)

This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.

Downloads: 0 This Week

Last Update: 2023-08-25
See Project
11

GoSpider

Gospider - Fast web spider written in Go

GoSpider - Fast web spider written in Go. Fast web crawling. Brute force and parse sitemap.xml. Parse robots.txt. Generate and verify link from JavaScript files. Link Finder. Find AWS-S3 from response source. Find subdomains from the response source. Get URLs from Wayback Machine, Common Crawl, Virus Total, Alien Vault. Format output easy to Grep. Support Burp input. Crawl multiple sites in parallel.

Downloads: 3 This Week

Last Update: 2023-01-27
See Project
12

django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface. With Django Dynamic Scraper (DDS) you can define your Scrapy scrapers dynamically via the Django admin interface and save your scraped items in the database you defined for your Django project. Since it simplifies things DDS is not usable for all kinds of scrapers, but...

Downloads: 0 This Week

Last Update: 2022-09-05
See Project
13

DHT

BitTorrent DHT Protocol && DHT Spider.

DHT implements the bittorrent DHT protocol in Go. It contains two modes, the standard mode, and the crawling mode. The standard mode follows the BEPs, and you can use it as a standard dht server. The crawling mode aims to crawl as much metadata info as possible. It doesn't follow the standard BEPs protocol. With the crawling mode, you can build another BTDigg. The default crawl mode configuration costs about 300M RAM. Set MaxNodes and BlackListMaxSize to fit yourself. ...

Downloads: 1 This Week

Last Update: 2023-01-19
See Project
14

Node Crawler

Web Crawler/Spider for NodeJS + server-side jQuery

Most powerful, popular and production crawling/scraping package for Node, happy hacking.

Downloads: 0 This Week

Last Update: 2023-09-20
See Project
15

druid4arduino

An automatic, configuration-less, GUI for Arduino projects.

Druid4Arduino provides a simple GUI (graphical user interface) to interact with any SerialUI-based Arduino project. It works its magic by crawling the menu hierarchy (commands and sub-menus) provided by SerialUI and automatically re-configuring it’s user interface to match whatever options you’ve provided. It will connect to you’re arduino project through the USB serial port and display a reflection of all the commands and sub-menus defined in your program/sketch. It will also request and transmit any required input or error messages.

1 Review

Downloads: 12 This Week

Last Update: 2014-08-13
See Project
16

JLinkCheck

JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features

Downloads: 0 This Week

Last Update: 2016-04-26
See Project
17

Crawler/Load Tester in Java

JCrawler is a perfect cralwing/load-testing tool which is cookie-enabled and follows human crawling pattern (hit/second).

Downloads: 0 This Week

Last Update: 2013-04-25
See Project
18

The Legacy Crawling Game Markup Language

LCGML (Legacy Crawling Game Markup Language) makes it possible do script old fashioned text adventures. It is an XML-driven Document format. For the stable product, an perl interpreter is also planned

Downloads: 0 This Week

Last Update: 2015-04-15
See Project
19

Blackfire Player

Web Crawling, Web Testing, and Web Scraping application

Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses. Some Blackfire Player use cases: Crawl a website/API and check expectations -- aka Acceptance Tests; Scrape a website/API and extract values; Monitor a website; Test code with unit test integration (PHPUnit, Behat, Codeception, ...); Test code behavior from the outside thanks to the native Blackfire Profiler integration -- aka Unit Tests from the HTTP layer (tm). ...

Downloads: 0 This Week

Last Update: 2019-06-11
See Project