Showing 1977 open source projects for "python web crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 1
    Python Web

    Python Web

    Course to learn frontend web development

    This repository is a beginner-friendly template for creating Python web applications using Flask. Designed by @mouredev for learning and practice, it provides a simple, minimalistic structure for serving HTML pages and static content. Ideal for educational purposes and small-scale web projects, it also includes preconfigured files to simplify deployment and local development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Tools to Build an Efficient Quality Management System. Icon
    Tools to Build an Efficient Quality Management System.

    Sunday Business Systems (SBS) offers comprehensive quality management software (QMS) and consulting services to improve compliance

    Sunday Business Systems is ideal for small to mid-sized manufacturing and service businesses looking for cost-effective, customizable quality management software to help them comply with industry standards and improve operational efficiency
    Learn More
  • 5
    Best-of Web Development with Python

    Best-of Web Development with Python

    A ranked list of awesome python libraries for web development

    ...If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! A ranked list of awesome python libraries for web development. Updated weekly.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Playwright for Python

    Playwright for Python

    Python version of the Playwright testing and automation library

    Playwright enables reliable end-to-end testing for modern web apps. Single API to automate Chromium, Firefox and WebKit. Capable automation for single page apps that rely on the modern web platform. Use the Playwright API in JavaScript & TypeScript, Python, .NET and, Java. With Playwright, test how your app behaves in Apple Safari with WebKit builds for Windows, Linux and macOS. Test locally and on CI.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 9
    FastAPI Python

    FastAPI Python

    FastAPI framework, high performance, easy to learn, fast to code

    FastAPI framework, high performance, easy to learn, fast to code, ready for production. FastAPI is a modern, fast (high-performance), web framework for building APIs with Python based on standard Python type hints.
    Downloads: 10 This Week
    Last Update:
    See Project
  • #1 Solar Design & CRM Software for fast-growing solar businesses. Icon
    #1 Solar Design & CRM Software for fast-growing solar businesses.

    Solar businesses looking for a design and CRM solution to create compelling solar proposals

    Create accurate proposals from anywhere in under 2 minutes using hi-res imagery, energy analysis and 3D shading tools in Pylon's intuitive design studio. Pylon is the only solar design software to give you high-resolution imagery in-app with no monthly fees. Identify dates of concern and track shading impact throughout the year using Pylon's award-winning 3D Solar Shading toolkit. Help your team better understand customer consumption patterns and maximize self-consumption using Pylon's load profile and interval data analysis. Analyze load profiles & interval data. Close more solar proposals with interactive Web & PDF proposals, native eSignatures and a payment processing gateway. Fully integrated solar CRM designed to integrate with your solar design software and convert proposals. Get 2-way SMS and email, communications with your team, lead management, and ready-made deal pipelines with Pylon Solar CRM.
    Learn More
  • 10
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    ...Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler could just load actually all links it is finding (and is allowed to load according to the robots.txt file), then it would just load the whole internet (if the URL(s) it starts with are no dead end). Or it can be restricted to load only links matching certain criteria (on same domain/host, URL path starts with "/foo",...) or only to a certain depth. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    ...If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! Ranked list of awesome python libraries for web development. Correctly generate plurals, ordinals, indefinite articles; convert numbers. Libraries for loading, collecting, and extracting data from a variety of data sources and formats. Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    AUTOMATIC1111 Stable Diffusion web UI
    AUTOMATIC1111's stable-diffusion-webui is a powerful, user-friendly web interface built on the Gradio library that allows users to easily interact with Stable Diffusion models for AI-powered image generation. Supporting both text-to-image (txt2img) and image-to-image (img2img) generation, this open-source UI offers a rich feature set including inpainting, outpainting, attention control, and multiple advanced upscaling options. With a flexible installation process across Windows, Linux, and...
    Downloads: 91 This Week
    Last Update:
    See Project
  • 13
    MDServer Web

    MDServer Web

    Simple Linux Panel

    MDServer-Web is an open-source, web-based control panel for managing web servers and hosting environments. It supports popular web servers like Nginx and Apache, along with databases such as MySQL and Redis. The panel provides a user-friendly interface to manage websites, databases, SSL certificates, and more, making server administration accessible even to those with limited technical knowledge.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Proxy_Pool

    Proxy_Pool

    Python crawler proxy IP pool (proxy pool)

    The main function of the crawler agent IP pool project is to regularly collect free agents published on the Internet for verification and storage, and to regularly verify and store agents to ensure the availability of agents, and to provide API and CLI. At the same time, you can also expand the proxy source to increase the quality and quantity of the proxy pool IP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Tabby Web

    Tabby Web

    An SSH/Telnet/Serial client in your browser

    Tabby Web brings a modern terminal experience to the browser by pairing a web UI with a backend gateway that brokers TCP connections over WebSockets. It aims to deliver an experience similar to the desktop Tabby terminal—sessions, profiles, and rich configuration—while being accessible anywhere through a login. The architecture splits concerns: a Django-based control plane manages users, auth, and configuration, while a gateway service handles network transport so browser clients can reach...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 17 This Week
    Last Update:
    See Project
  • 17
    Selenium-python Helium

    Selenium-python Helium

    Selenium-python but lighter: Helium is the best Python library

    Under the hood, Helium forwards each call to Selenium. The difference is that Helium's API is much more high-level. In Selenium, you need to use HTML IDs, XPaths and CSS selectors to identify web page elements. Helium on the other hand lets you refer to elements by user-visible labels. As a result, Helium scripts are typically 30-50% shorter than similar Selenium scripts. What's more, they are easier to read and more stable with respect to changes in the underlying web page. Selenium-python is great for web automation. Helium makes it easier to use. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Web Dev for Beginners

    Web Dev for Beginners

    About 24 Lessons, 12 Weeks, Get Started as a Web Developer

    Web-Dev-For-Beginners is Microsoft’s open source, project-based curriculum for learning web development from scratch. Designed as a 12-week, 24-lesson course, it covers HTML, CSS, and JavaScript fundamentals through hands-on projects like terrariums, browser extensions, and space games. Each lesson includes a mix of pre-lecture quizzes, written content, assignments, challenges, and post-lecture quizzes to reinforce learning. The course also offers global accessibility with translations in...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    OpenAI Quickstart Python

    OpenAI Quickstart Python

    Python example app from the OpenAI API quickstart tutorial

    ...The examples folder includes small, self-contained projects showcasing common use cases like chat completions, tool usage, and interactive interfaces. Each example is designed to be easily runnable with minimal setup—requiring only Python, a virtual environment, and an API key. The repository also includes environment setup guides and example scripts, such as a simple Flask web app for chat interactions, allowing developers to test OpenAI API integrations locally. Overall, openai-quickstart-python serves as an essential starting point for developers looking to prototype and experiment with OpenAI-powered apps.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    Text Generation Web UI

    Text Generation Web UI

    A gradio web UI for running Large Language Models like LLaMA

    A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS)....
    Downloads: 23 This Week
    Last Update:
    See Project
  • 21
    Snap Lens Web Crawler

    Snap Lens Web Crawler

    Crawl and download Snap Lenses from lens.snapchat.com with ease.

    Crawl and download Snap Lenses from lens.snapchat.com with ease. This crawler is a dependency of Snap Camera Server https://snap-camera-server.sourceforge.io
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Python 100 Days

    Python 100 Days

    Python - From Novice to Master in 100 Days

    Python-100-Days is a comprehensive, practice-first learning roadmap by Luo Hao that spans 100 days from absolute Python basics to professional, production-grade skills. It starts with foundational syntax, control flow, data structures, and functions, then advances through object-oriented programming, file I/O, exceptions, and modules. The middle sections focus on real-world Python applications, including working with CSV, Excel, Word, PowerPoint, PDFs, images, email/SMS, and regular expressions. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    Zappa - Serverless Python

    Zappa - Serverless Python

    Serverless Python

    Zappa makes it super easy to build and deploy server-less, event-driven Python applications (including, but not limited to, WSGI web apps) on AWS Lambda + API Gateway. Think of it as "serverless" web hosting for your Python apps. That means infinite scaling, zero downtime, zero maintenance - and at a fraction of the cost of your current deployments! With a traditional HTTP server, the server is online 24/7, processing requests one by one as they come in. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DNS Crawler

    DNS Crawler

    A Bulk Domain Assessment Tool

    DNS Crawler is a lightweight, Python-based utility designed for efficient batch processing and assessment of internet domain names. It reads from a list of domains formatted as: domain_name <tab> or ; optional_comment and generates a detailed, Excel-compatible CSV report with columns including: DOMAIN: Domain name REG: Registrar SOA, NS, MX, TXT, SPF, DMARC, MS, A, PTR: Common DNS records for comprehensive domain analysis NOTE: Optional comments from the original input file Whether you're managing multiple domains or auditing for cross-platform DNS consistency, DNS Crawler simplifies the process, offering a clear, structured output for easy review and reporting.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Complete-Python-3-Bootcamp

    Complete-Python-3-Bootcamp

    Course Files for Complete Python 3 Bootcamp Course on Udemy

    ...In addition, it includes applied exercises in areas such as web scraping, working with APIs, and using Python libraries like NumPy, pandas, Matplotlib, and Seaborn for data analysis and visualization. Learners can progress from beginner-friendly basics to more advanced programming skills while reinforcing their knowledge with practice problems and projects. Because it mirrors the course content, this repository is widely used by students taking the Udemy course.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next