Page 2 | python web crawler free download

Showing 1049 open source projects for "python web crawler"

View related business solutions

Internet Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Picsart Enterprise Background Removal API for Stunning eCommerce Visuals
Instantly remove the background from your images in just one click.

With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.

Learn More
1

SST

Build serverless apps. Set breakpoints and test your functions locally

SST is an open-source serverless application platform that deploys to your AWS account, helping you go from idea to IPO. Work on your local Lambda functions lives, without mocking or redeploying your app. Higher-level CDK constructs are made specifically for building serverless apps. Manage the resources in your application with the SST Console. Traditionally, we’ve built and deployed web applications where we have some degree of control over the HTTP requests that are made to our server. Our...

Downloads: 3 This Week

Last Update: 2025-07-04
See Project
2

Selectolax

Python binding to Modest and Lexbor engines

A fast HTML5 parser with CSS selectors using Modest and Lexbor engines. Selectolax supports two backends: Modest and Lexbor. By default, all examples use the Modest backend. Most of the features between backends are almost identical, but there are still some differences. Currently, the Lexbor backend is in beta and missing some of the features. To use lexbor, just import the parser and use it in the similar way to the HTMLParser.

Downloads: 2 This Week

Last Update: 2025-07-19
See Project
3

TinyStatus

Tiny status page generated by a Python script

TinyStatus is a simple, customizable status page generator that allows you to monitor the status of various services and display them on a clean, responsive web page.

Downloads: 2 This Week

Last Update: 2024-10-24
See Project
4

crwlr

Library for Rapid (Web) Crawler and Scraper Development

This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler could...

Downloads: 0 This Week

Last Update: 1 day ago
See Project
Gen AI apps are built with MongoDB Atlas
The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free
5

SeleniumBase

A framework for browser automation and testing with Selenium

SeleniumBase automatically handles common WebDriver actions such as launching web browsers before tests, saving screenshots during failures, and closing web browsers after tests. SeleniumBase lets you customize test runs from the command line. SeleniumBase uses simple syntax for commands. pytest includes automatic test discovery. If you don't specify a specific file or folder to run, pytest will automatically search through all subdirectories for tests to run. No More Flaky Tests! SeleniumBase...

Downloads: 2 This Week

Last Update: 2025-07-15
See Project
6

mitmproxy

A free and open source interactive HTTPS proxy

mitmproxy is an open source, interactive SSL/TLS-capable intercepting HTTP proxy, with a console interface fit for HTTP/1, HTTP/2, and WebSockets. It's the ideal tool for penetration testers and software developers, able to debug, test, and make privacy measurements. It can intercept, inspect, modify and replay web traffic, and can even prettify and decode a variety of message types. Its web-based interface mitmweb gives you a similar experience as Chrome's DevTools, with the addition...

Downloads: 2 This Week

Last Update: 2025-05-25
See Project
7

JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...

Downloads: 2 This Week

Last Update: 2024-09-29
See Project
8

Playwright for .NET

.NET version of the Playwright testing and automation library

..., JavaScript, Python, .NET, Java. Test Mobile Web. Native mobile emulation of Google Chrome for Android and Mobile Safari. The same rendering engine works on your Desktop and in the Cloud. Auto-wait. Playwright waits for elements to be actionable prior to performing actions. It also has a rich set of introspection events. The combination of the two eliminates the need for artificial timeouts - the primary cause of flaky tests.

Downloads: 2 This Week

Last Update: 2025-07-22
See Project
9

Requests for PHP

Requests for PHP is a humble HTTP request library

Requests is a HTTP library written in PHP, for human beings. It is roughly based on the API from the excellent Requests Python library. Requests is ISC Licensed (similar to the new BSD license) and has no dependencies, except for PHP 5.6+. Despite PHP’s use as a language for the web, its tools for sending HTTP requests are severely lacking. cURL has an interesting API, to say the least, and you can’t always rely on it being available. Sockets provide only low-level access and require you...

Downloads: 2 This Week

Last Update: 2025-01-21
See Project
Powering the best of the internet | Fastly
Fastly's edge cloud platform delivers faster, safer, and more scalable sites and apps to customers.

Ensure your websites, applications and services can effortlessly handle the demands of your users with Fastly. Fastly’s portfolio is designed to be highly performant, personalized and secure while seamlessly scaling to support your growth.

Try for free
10

Agent360

360 monitoring agent

360 Monitoring is a web service that monitors and displays statistics of your server performance. Agent360 is OS-agnostic software compatible with Python 3.7 and 3.8. It's been optimized to have low CPU consumption and comes with an extendable set of useful plugins.

Downloads: 1 This Week

Last Update: 2025-07-07
See Project
11

finvizfinance

Finviz analysis python library

finvizfinance is a package that collects financial information from FinViz website. Stock charts, fundamental & technical information, insider information and stock news. Forex charts and performance. Crypto charts and performance. Screener and Group provide data frames for comparing stocks according to different filters and trading signals. Getting information (fundament, description, outer rating, stock news, inside trader) of an individual stock.

Downloads: 1 This Week

Last Update: 2025-07-04
See Project
12

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 1 This Week

Last Update: 2025-07-03
See Project
13

socketify.py

Bringing Http/Https and WebSockets High Performance servers for PyPy3

Socketify.py is a reliable, high-performance Python web framework for building large-scale app backends and microservices. With no precedents websocket performance and a really fast HTTP server that can delivery encrypted TLS 1.3 quicker than most alternative servers can do even unencrypted, cleartext messaging.

Downloads: 1 This Week

Last Update: 2024-10-29
See Project
14

Trafilatura

Python & command-line tool to gather text on the Web

Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first...

Downloads: 1 This Week

Last Update: 2024-12-03
See Project
15

ScrapydWeb

Web app for Scrapyd cluster management

Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization. Make sure that Scrapyd has been installed and started on all of your hosts. Start ScrapydWeb via command scrapydweb. (a config file would be generated for customizing settings on the first startup.) Add your Scrapyd servers, both formats of string and tuple are supported, you can attach basic auth for accessing the Scrapyd server, as well as a string for grouping or labeling. You can select any number...

Downloads: 1 This Week

Last Update: 2025-02-16
See Project
16

img2dataset

Easily turn large sets of image urls to an image dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine. Also supports saving captions for url+caption datasets. Opt-out directives: Websites can pass the http headers X-Robots-Tag: noai, X-Robots-Tag: noindex , X-Robots-Tag: noimageai and X-Robots-Tag: noimageindex By default img2dataset will ignore images with such headers.

Downloads: 1 This Week

Last Update: 2024-01-22
See Project
17

dude uncomplicated data extraction

dude uncomplicated data extraction: A simple framework

Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.

Downloads: 1 This Week

Last Update: 2024-03-02
See Project
18

Basketball Reference

NBA Stats API via Basketball Reference

Basketball Reference is a great site (especially for a basketball stats nut like me), and hopefully, they don't get too pissed off at me for creating this. I initially wrote this library as an exercise for creating my first PyPi package, hope you find it valuable! This library was created for another Python project where I was trying to estimate an NBA player's productivity. A lot of sports-related APIs are expensive - luckily, Basketball Reference provides a free service which can be scraped...

Downloads: 1 This Week

Last Update: 2024-11-28
See Project
19

Gunicorn

WSGI HTTP Server for UNIX, fast clients and sleepy applications

Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX. It's a pre-fork worker model. The Gunicorn server is broadly compatible with various web frameworks, simply implemented, light on server resources, and fairly speedy. You can run Gunicorn by using commands or integrate with popular frameworks like Django, Pyramid, or TurboGears. For deploying Gunicorn in production see Deploying Gunicorn. After installing Gunicorn you will have access to the command line script gunicorn. Gunicorn...

Downloads: 1 This Week

Last Update: 2024-08-10
See Project
20

LinkChecker

Check links in web documents or full websites

LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub Packages.

Downloads: 1 This Week

Last Update: 1 day ago
See Project
21

Scrapy-Redis

Redis-based components for Scrapy

You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version 0.3...

Downloads: 1 This Week

Last Update: 2024-07-06
See Project
22

HTTPie

A CLI, cURL-like tool for humans

HTTPie is a modern command-line HTTP client that makes CLI interaction with web services as human-friendly as possible. It offers a plethora of friendly features that make it an excellent curl alternative. It is equipped with an intuitive UI, JSON support, syntax highlighting and so much more. HTTPie gives a single http command for sending arbitrary HTTP requests with a simple, natural syntax, and displayed in a formatted, colorized terminal output. HTTPie can be installed on macOS, Windows...

Downloads: 1 This Week

Last Update: 2025-03-06
See Project
23

Letterboxd Recommendations

Scraping publicly-accessible Letterboxd data for movie recommendations

Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username. A user's "star" ratings are scraped from their Letterboxd profile and assigned numerical ratings from 1 to 10 (accounting for half stars). Their ratings are then combined with a sample of ratings from the top 4000 most active users on the site to create a collaborative filtering recommender model using singular value...

Downloads: 1 This Week

Last Update: 2024-12-27
See Project
24

crawley

The unix-way web crawler

Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for URLs...

Downloads: 0 This Week

Last Update: 2025-06-18
See Project
25

Sanic

Async Python 3.6+ web server/framework

Build fast, run fast with Sanic! Sanic is a Python 3.6+ web server and web framework designed to go fast. It provides a way to get a highly performant HTTP server up and running fast, while also making it easy to build, expand, and eventually scale. Sanic aspires to be as simple as possible while delivering the performance that you require. It allows the usage of the async/await syntax added in Python 3.5, so your code is guaranteed to be non-blocking and speedy. It's also ASGI compliant, so...

Downloads: 0 This Week

Last Update: 2025-03-31
See Project