crawling free download

Showing 22 open source projects for "crawling"

View related business solutions

Python Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Automate contact and company data extraction
Build lead generation pipelines that pull emails, phone numbers, and company details from directories, maps, social platforms. Full API access.

Generate leads at scale without building or maintaining scrapers. Use 10,000+ ready-made tools that handle authentication, pagination, and anti-bot protection. Pull data from business directories, social profiles, and public sources, then export to your CRM or database via API. Schedule recurring extractions, enrich existing datasets, and integrate with your workflows.

Explore Apify Store
1

Crawl4AI

Open-source LLM Friendly Web Crawler & Scraper

Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
2

Scrapy

A fast, high-level web crawling and web scraping framework

Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications.

Downloads: 19 This Week

Last Update: 2026-01-12
See Project
3

Trafilatura

Python & command-line tool to gather text on the Web

Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first by avoiding the noise caused by recurring elements (headers, footers, links/blogroll etc.) and second by including information such as author and date in order to make sense of the data. ...

Downloads: 1 This Week

Last Update: 2024-12-03
See Project
4

Douyin TikTok Download API

Douyin TikTok Download API

...Fast, asynchronous, free, open source, ad-free, long-term maintenance. This project is based on PyWebIO , FastAPI , HTTPX , a fast and asynchronous Douyin / TikTok data crawling tool, and realizes online batch parsing and downloading of watermark-free videos or atlases through the web, data crawling API, and iOS shortcut instructions for watermark-free download and other functions. You can deploy or transform this project yourself to achieve more functions, or you can directly call scraper.py in your project or install an existing pip package as a parsing library to easily crawl data, etc. ...

Downloads: 9 This Week

Last Update: 2025-03-16
See Project
Total Network Visibility for Network Engineers and IT Managers
Network monitoring and troubleshooting is hard. TotalView makes it easy.

This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.

Learn More
5

DeerFlow

Deep Research framework, combining language models with tools

DeerFlow is an open-source, community-driven “deep research” framework / multi-agent orchestration platform developed by ByteDance. It aims to combine the reasoning power of large language models (LLMs) with automated tool-use — such as web search, web crawling, Python execution, and data processing — to enable complex, end-to-end research workflows. Instead of a monolithic AI assistant, DeerFlow defines multiple specialized agents (e.g. “planner,” “searcher,” “coder,” “report generator”) that collaborate in a structured workflow, allowing tasks like literature reviews, data gathering, data analysis, code execution, and final report generation to be largely automated. ...

Downloads: 1 This Week

Last Update: 2 days ago
See Project
6

DB-GPT

Revolutionizing Database Interactions with Private LLM Technology

DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.

Downloads: 1 This Week

Last Update: 2025-10-24
See Project
7

LinkChecker

Check links in web documents or full websites

LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub...

Downloads: 0 This Week

Last Update: 2025-07-28
See Project
8

Scrapy-Redis

Redis-based components for Scrapy

You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version...

Downloads: 0 This Week

Last Update: 2024-07-06
See Project
9

Python-Spider

Python3 web crawler practice

Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe...

Downloads: 1 This Week

Last Update: 2025-12-08
See Project
eProcurement Software
Enterprises and companies seeking a solution to manage all their procurement operations and processes

eBuyerAssist by Eyvo is a cloud-based procurement solution designed for businesses of all sizes and industries. Fully modular and scalable, it streamlines the entire procurement lifecycle—from requisition to fulfillment. The platform includes powerful tools for strategic sourcing, supplier management, warehouse operations, and contract oversight. Additional modules cover purchase orders, approval workflows, inventory and asset management, customer orders, budget control, cost accounting, invoice matching, vendor credit checks, and risk analysis. eBuyerAssist centralizes all procurement functions into a single, easy-to-use system—improving visibility, control, and efficiency across your organization. Whether you're aiming to reduce costs, enhance compliance, or align procurement with broader business goals, eBuyerAssist helps you get there faster, smarter, and with measurable results.

Learn More
10

Gerapy

Distributed Crawler Management Framework Based on Scrapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t run the crawler on our own machine, a good one. The method is to deploy Scrapy to a remote server for execution. ...

Downloads: 0 This Week

Last Update: 2023-07-19
See Project
11

django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface. With Django Dynamic Scraper (DDS) you can define your Scrapy scrapers dynamically via the Django admin interface and save your scraped items in the database you defined for your Django project. Since it simplifies things DDS is not usable for all kinds of scrapers, but...

Downloads: 0 This Week

Last Update: 2022-09-05
See Project
12

Tushare

TuShare is a utility for crawling historical data of China stocks

Tushare is a Python library that provides access to a wide range of financial data focused on the Chinese stock market. It allows users to retrieve real-time and historical market data, financial reports, index data, and macroeconomic indicators. Tushare is widely used in quantitative trading, data analysis, and academic research. It supports both free and premium data tiers via Tushare Pro, which requires an API token.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
13

LAIR! the game

LAIR is a Sandbox Dungeon Crawling Game

LAIR! is a game designed to provide virtually limitless versatility in the environment that it provides. While you create a new character, it generates a dungeon for you to explore with thousands of creatures, treasures, and secrets. Every piece of content can be generated procedurally by the game engine, so new dungeons and treasures can appear at any time.

Downloads: 0 This Week

Last Update: 2016-11-28
See Project
14

SE Auditor

Free SEO audit software.

SE Auditor is a program for analyzing web pages for search engines. SE Auditor is application that you can use to view statistical data about your website, in order to improve its position within the Web search results. SE Auditor is addressed to SEO professionals, website designers, developers, website testers and owners. SE Auditor enables you to check meta description, keywords, sitemap, the number of links and keyword consistency, the text/HTML ratio and many more ranking /...

Downloads: 0 This Week

Last Update: 2015-02-23
See Project
15

Search Engine in python

All student and developers are invited to join this web search engine

...Importance is placed on how web pages are ranked by a quick and efficient algorithm. All hands on deck is how we are calling this project. Algorithm design is really important and creative methods from better search results or web crawling is welcomed from all amateur programmers, students, web developers, software developers and computer science students from around the globe. We believe there's a better way to search, and that's why we are working on this project, why not join us.

1 Review

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
16

LinkChecker

check links in web documents or full websites

New Homepage: http://wummel.github.io/linkchecker/ Linkchecker features: - recursive and multithreaded checking and site crawling - output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats - HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support - restrict link checking with regular expression filters for URLs - proxy support - username/password authorization for HTTP, FTP

25 Reviews

Downloads: 2 This Week

Last Update: 2014-02-14
See Project
17

Web Crawler Security Tool

A web crawler oriented to information security.

...The crawler has been completely rewritten in v1.0 bringing a lot of improvements: improved the data visualization, interactive option to download files, increased speed in crawling, exports list of found files into a separated file (useful to crawl a site once, then download files and analyse them with FOCA), generate an output log in Common Log Format (CLF), manage basic authentication and more! Many of the old features has been reimplemented and the most interesting one is the capability of the crawler to search for directory indexing.

3 Reviews

Downloads: 0 This Week

Last Update: 2015-10-10
See Project
18

Python Crawler Library

Python Web Crawler Library

A simple library for crawling the web. This library will give you the ability to create macros for crawling web site and preforming simple actions like preforming "log in" and other simple actions in web sites.

Downloads: 0 This Week

Last Update: 2015-06-04
See Project
19

The Legend of Irithed

The Legend of Irithed is a text-based adventure game.

It's not much to look at right now, but The Legend of Irithed aims to be an immersive, text-based, sometimes serious, sometimes humorous, adventure game with emphasis on dungeon crawling, treasure hunting, and combat. Use the discussion forum to report bugs and make suggestions. If you would like to get in contact with me directly, you can send me an email at m.william.schatz@gmail.com. For development news, follow @w_schatz on Twitter. Non-Windows users: You need to have python2 installed. If your computer doesn't have python2, or if you're not sure, you can get it http://python.org/ , though you should be able to just navigate to the folder with irithed_alpha.py in a terminal and type "python irithed_alpha.py" without the quotes. ...

1 Review

Downloads: 0 This Week

Last Update: 2014-04-26
See Project
20

Ruya

Ruya is a Python-based breadth-first, level-, delayed, event-based-crawler for crawling English, Japanese websites. It is targeted solely towards developers who want crawling functionality in their projects using API, and crawl control.

Downloads: 0 This Week

Last Update: 2013-03-27
See Project
21

NetKnowledge

An intelligent, web-crawling agent that builds invariant representations of data collected.

Downloads: 0 This Week

Last Update: 2013-03-21
See Project
22

Distributed Webhunter

Webhunter is a distributed, multi-threaded web crawler designed for both general indexing and crawling the web for focused content.

Downloads: 0 This Week

Last Update: 2013-04-05
See Project