Showing 921 open source projects for "python web crawler"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • 1
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 2
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Playwright for Python

    Playwright for Python

    Python version of the Playwright testing and automation library

    Playwright enables reliable end-to-end testing for modern web apps. Single API to automate Chromium, Firefox and WebKit. Capable automation for single page apps that rely on the modern web platform. Use the Playwright API in JavaScript & TypeScript, Python, .NET and, Java. With Playwright, test how your app behaves in Apple Safari with WebKit builds for Windows, Linux and macOS. Test locally and on CI.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    ...Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler could just load actually all links it is finding (and is allowed to load according to the robots.txt file), then it would just load the whole internet (if the URL(s) it starts with are no dead end). Or it can be restricted to load only links matching certain criteria (on same domain/host, URL path starts with "/foo",...) or only to a certain depth. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Proxy_Pool

    Proxy_Pool

    Python crawler proxy IP pool (proxy pool)

    The main function of the crawler agent IP pool project is to regularly collect free agents published on the Internet for verification and storage, and to regularly verify and store agents to ensure the availability of agents, and to provide API and CLI. At the same time, you can also expand the proxy source to increase the quality and quantity of the proxy pool IP.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Tabby Web

    Tabby Web

    An SSH/Telnet/Serial client in your browser

    Tabby Web brings a modern terminal experience to the browser by pairing a web UI with a backend gateway that brokers TCP connections over WebSockets. It aims to deliver an experience similar to the desktop Tabby terminal—sessions, profiles, and rich configuration—while being accessible anywhere through a login. The architecture splits concerns: a Django-based control plane manages users, auth, and configuration, while a gateway service handles network transport so browser clients can reach...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    DNS Crawler

    DNS Crawler

    A Bulk Domain Assessment Tool

    DNS Crawler is a lightweight, Python-based utility designed for efficient batch processing and assessment of internet domain names. It reads from a list of domains formatted as: domain_name <tab> or ; optional_comment and generates a detailed, Excel-compatible CSV report with columns including: DOMAIN: Domain name REG: Registrar SOA, NS, MX, TXT, SPF, DMARC, MS, A, PTR: Common DNS records for comprehensive domain analysis NOTE: Optional comments from the original input file Whether you're managing multiple domains or auditing for cross-platform DNS consistency, DNS Crawler simplifies the process, offering a clear, structured output for easy review and reporting.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Roach

    Roach

    The complete web scraping toolkit for PHP

    Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    uvicorn

    uvicorn

    An ASGI web server, for Python

    Uvicorn is an ASGI web server implementation for Python. Until recently Python has lacked a minimal low-level server/application interface for async frameworks. The ASGI specification fills this gap, and means we're now able to start building a common set of tooling usable across all async frameworks. Uvicorn currently supports HTTP/1.1 and WebSockets.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 10
    Certbot

    Certbot

    Get free HTTPS certificates forever from Let's Encrypt

    Certbot is a fully-featured, easy-to-use, extensible client for the Let's Encrypt CA. It fetches a digital certificate from Let’s Encrypt, an open certificate authority launched by the EFF, Mozilla, and others. This certificate then lets browsers verify the identity of web servers and ensures secure communication over the Web. Obtaining and maintaining a certificate is usually such a hassle, but with Certbot and Let’s Encrypt it becomes automated and hassle-free. With just a few simple...
    Downloads: 198 This Week
    Last Update:
    See Project
  • 11
    XX-Net

    XX-Net

    A web proxy tool

    XX-Net is an easy-to-use, anti-censorship web proxy tool from China. It includes GAE_proxy and X-Tunnel, with support for multiple platforms.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 12
    Agent360

    Agent360

    360 monitoring agent

    360 Monitoring is a web service that monitors and displays statistics of your server performance. Agent360 is OS-agnostic software compatible with Python 3.7 and 3.8. It's been optimized to have low CPU consumption and comes with an extendable set of useful plugins.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 14
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    finvizfinance

    finvizfinance

    Finviz analysis python library

    finvizfinance is a package that collects financial information from FinViz website. Stock charts, fundamental & technical information, insider information and stock news. Forex charts and performance. Crypto charts and performance. Screener and Group provide data frames for comparing stocks according to different filters and trading signals. Getting information (fundament, description, outer rating, stock news, inside trader) of an individual stock.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 16
    Helium Browser

    Helium Browser

    Private, fast, and honest web browser

    Helium is a Chromium-based web browser designed to deliver privacy, speed, and simplicity by removing Google’s proprietary services, telemetry, and bloat. It’s built atop ungoogled-chromium, extending its philosophy with additional privacy features, design refinements, and user experience improvements aimed at transparency and control. Helium blocks ads and trackers by default through an integrated, unbiased uBlock Origin extension prepackaged as a native browser component. Its UI and...
    Downloads: 83 This Week
    Last Update:
    See Project
  • 17
    Sanic

    Sanic

    Async Python 3.6+ web server/framework

    Build fast, run fast with Sanic! Sanic is a Python 3.6+ web server and web framework designed to go fast. It provides a way to get a highly performant HTTP server up and running fast, while also making it easy to build, expand, and eventually scale. Sanic aspires to be as simple as possible while delivering the performance that you require. It allows the usage of the async/await syntax added in Python 3.5, so your code is guaranteed to be non-blocking and speedy. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    SiteOne Crawler (desktop app)

    SiteOne Crawler (desktop app)

    A free, feature-rich web analyzer and exporter/cloner you will love!

    A free in-depth website analyzer providing audits of security, performance, SEO, accessibility and other technical aspects. Available as a desktop application for Windows/macOS/Linux and as a CLI tool for advanced users and CI/CD processes. It also includes an offline web page exporter (website clone, mirror).
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    PostHog

    PostHog

    PostHog provides open-source web & product analytics

    PostHog is an all‑in‑one open‑source platform for product and web analytics—offering event-based analytics, session recording, feature flagging, A/B testing, cohorts, and more—that you can self‑host, with full support for data privacy and enterprise compliance. Sync data from external tools like Stripe, Hubspot, your data warehouse, and more. Query it alongside your product data. Run custom filters and transformations on your incoming data. Send it to 25+ tools or any webhook in real time or...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    Selectolax

    Selectolax

    Python binding to Modest and Lexbor engines

    A fast HTML5 parser with CSS selectors using Modest and Lexbor engines. Selectolax supports two backends: Modest and Lexbor. By default, all examples use the Modest backend. Most of the features between backends are almost identical, but there are still some differences. Currently, the Lexbor backend is in beta and missing some of the features. To use lexbor, just import the parser and use it in the similar way to the HTMLParser.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    ungoogled-chromium

    ungoogled-chromium

    A lightweight approach to removing Google web service dependency

    In descending order of significance (i.e. most important objective first), ungoogled-chromium is Google Chromium, sans dependency on Google web services, ungoogled-chromium retains the default Chromium experience as closely as possible. Unlike other Chromium forks that have their own visions of a web browser, ungoogled-chromium is essentially a drop-in replacement for Chromium. ungoogled-chromium features tweaks to enhance privacy, control, and transparency. However, almost all of these...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 22
    MechanicalSoup

    MechanicalSoup

    A Python library for automating interaction with websites

    A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can follow links and submit forms. It doesn't do JavaScript. MechanicalSoup was created by M Hickford, who was a fond user of the Mechanize library. Unfortunately, Mechanize was incompatible with Python 3 until 2019 and its development stalled for several years. MechanicalSoup provides a similar API, built on Python giants Requests (for HTTP sessions) and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    mitmproxy

    mitmproxy

    A free and open source interactive HTTPS proxy

    mitmproxy is an open source, interactive SSL/TLS-capable intercepting HTTP proxy, with a console interface fit for HTTP/1, HTTP/2, and WebSockets. It's the ideal tool for penetration testers and software developers, able to debug, test, and make privacy measurements. It can intercept, inspect, modify and replay web traffic, and can even prettify and decode a variety of message types. Its web-based interface mitmweb gives you a similar experience as Chrome's DevTools, with the addition of...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 24
    Mercury Browser

    Mercury Browser

    Privacy-focused web browser fork of Firefox

    Mercury Browser is an optimized, privacy-focused web browser that is a fork of Mozilla Firefox. It incorporates compiler optimizations such as AVX, AES, LTO, and PGO to enhance performance and security. With features derived from projects like LibreWolf, Waterfox, and Ghostery, Mercury disables telemetry and debugging elements by default, ensuring a more private browsing experience. It also includes usability patches that bring back features like the classic top bar and supports unsigned...
    Downloads: 59 This Week
    Last Update:
    See Project
  • 25
    miniblink49

    miniblink49

    Lighter, faster browser kernel of blink to integrate HTML UI in apps

    ...After turning off the cross-domain switch, you can use various cross-domain functions (support cross-domain). Headless mode, which greatly saves resources and is suitable for crawlers (headless mode, be suitable for Web Crawler).
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next