Showing 1977 open source projects for "python web crawler"

View related business solutions
  • Compliance Operations Platform. Built to Scale. Icon
    Compliance Operations Platform. Built to Scale.

    Gain the visibility, efficiency, and consistency you and your team need to stay on top of all your security assurance and compliance work.

    Hyperproof makes building out and managing your information security frameworks easy by automating repetitive compliance operation tasks so your team can focus on the bigger things. The Hyperproof solution also offers powerful collaboration features that make it easy for your team to coordinate efforts, collect evidence, and work directly with auditors in a single interface. Gone are the days of uncertainty around audit preparation and compliance management process. With Hyperproof you get a holistic view of your compliance programs with progress tracking, program health monitoring, and risk management.
    Learn More
  • Native Teams: Payments and Employment for International Teams Icon
    Native Teams: Payments and Employment for International Teams

    Expand Your Global Team in 85+ Countries

    With Native Teams’ Employer of Record (EOR) service, you can compliantly hire in 85+ countries without setting up a legal entity. From dedicated employee support and localised benefits to tax optimisation, we help you build a global team that feels truly cared for.
    Learn More
  • 1
    Python Web

    Python Web

    Course to learn frontend web development

    This repository is a beginner-friendly template for creating Python web applications using Flask. Designed by @mouredev for learning and practice, it provides a simple, minimalistic structure for serving HTML pages and static content. Ideal for educational purposes and small-scale web projects, it also includes preconfigured files to simplify deployment and local development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    Best-of Web Development with Python

    Best-of Web Development with Python

    A ranked list of awesome python libraries for web development

    ...If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! A ranked list of awesome python libraries for web development. Updated weekly.
    Downloads: 20 This Week
    Last Update:
    See Project
  • Gradelink Student Information System Icon
    Gradelink Student Information System

    Elementary, Middle and High Schools, K-8, K-12, Private, Charter, College Departments and Trade/Technical Schools

    Help your school save time, increase enrollment, and achieve its mission with Gradelink. An award-winning student information system and school management system, Gradelink school management software system is suitable for Preschool through High school. Gradelink is a great fit for higher education and for small colleges. Gradelink combines school management, teaching, and learning tools to help schools perform their best. Top features include attendance management, report cards, classes and scheduling, standards-grading system, communications, student information, and student/parent access. Lesson plans, grade sheets, parent communication and custom reports all work together in perfect unison. Gradelink is ideal for K8 schools, private schools and Charter schools.
    Learn More
  • 5
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Playwright for Python

    Playwright for Python

    Python version of the Playwright testing and automation library

    Playwright enables reliable end-to-end testing for modern web apps. Single API to automate Chromium, Firefox and WebKit. Capable automation for single page apps that rely on the modern web platform. Use the Playwright API in JavaScript & TypeScript, Python, .NET and, Java. With Playwright, test how your app behaves in Apple Safari with WebKit builds for Windows, Linux and macOS. Test locally and on CI.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 9
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    ...If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! Ranked list of awesome python libraries for web development. Correctly generate plurals, ordinals, indefinite articles; convert numbers. Libraries for loading, collecting, and extracting data from a variety of data sources and formats. Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Ingest Package Label Data Using OCR Software Icon
    Ingest Package Label Data Using OCR Software

    PackageX OCR API converts any smartphone into a powerful universal label scanner that reads every bit of text on the label, including barcodes and QR

    Our state-of-the-art OCR technology uses robust deep learning models and proprietary algorithms to extract information from package labels.
    Learn More
  • 10
    MDServer Web

    MDServer Web

    Simple Linux Panel

    MDServer-Web is an open-source, web-based control panel for managing web servers and hosting environments. It supports popular web servers like Nginx and Apache, along with databases such as MySQL and Redis. The panel provides a user-friendly interface to manage websites, databases, SSL certificates, and more, making server administration accessible even to those with limited technical knowledge.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    FastAPI Python

    FastAPI Python

    FastAPI framework, high performance, easy to learn, fast to code

    FastAPI framework, high performance, easy to learn, fast to code, ready for production. FastAPI is a modern, fast (high-performance), web framework for building APIs with Python based on standard Python type hints.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 12
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    ...Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler could just load actually all links it is finding (and is allowed to load according to the robots.txt file), then it would just load the whole internet (if the URL(s) it starts with are no dead end). Or it can be restricted to load only links matching certain criteria (on same domain/host, URL path starts with "/foo",...) or only to a certain depth. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    AUTOMATIC1111 Stable Diffusion web UI
    AUTOMATIC1111's stable-diffusion-webui is a powerful, user-friendly web interface built on the Gradio library that allows users to easily interact with Stable Diffusion models for AI-powered image generation. Supporting both text-to-image (txt2img) and image-to-image (img2img) generation, this open-source UI offers a rich feature set including inpainting, outpainting, attention control, and multiple advanced upscaling options. With a flexible installation process across Windows, Linux, and...
    Downloads: 109 This Week
    Last Update:
    See Project
  • 14
    Tabby Web

    Tabby Web

    An SSH/Telnet/Serial client in your browser

    Tabby Web brings a modern terminal experience to the browser by pairing a web UI with a backend gateway that brokers TCP connections over WebSockets. It aims to deliver an experience similar to the desktop Tabby terminal—sessions, profiles, and rich configuration—while being accessible anywhere through a login. The architecture splits concerns: a Django-based control plane manages users, auth, and configuration, while a gateway service handles network transport so browser clients can reach...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    Proxy_Pool

    Proxy_Pool

    Python crawler proxy IP pool (proxy pool)

    The main function of the crawler agent IP pool project is to regularly collect free agents published on the Internet for verification and storage, and to regularly verify and store agents to ensure the availability of agents, and to provide API and CLI. At the same time, you can also expand the proxy source to increase the quality and quantity of the proxy pool IP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Selenium-python Helium

    Selenium-python Helium

    Selenium-python but lighter: Helium is the best Python library

    Under the hood, Helium forwards each call to Selenium. The difference is that Helium's API is much more high-level. In Selenium, you need to use HTML IDs, XPaths and CSS selectors to identify web page elements. Helium on the other hand lets you refer to elements by user-visible labels. As a result, Helium scripts are typically 30-50% shorter than similar Selenium scripts. What's more, they are easier to read and more stable with respect to changes in the underlying web page. Selenium-python is great for web automation. Helium makes it easier to use. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 12 This Week
    Last Update:
    See Project
  • 18
    OpenAI Quickstart Python

    OpenAI Quickstart Python

    Python example app from the OpenAI API quickstart tutorial

    ...The examples folder includes small, self-contained projects showcasing common use cases like chat completions, tool usage, and interactive interfaces. Each example is designed to be easily runnable with minimal setup—requiring only Python, a virtual environment, and an API key. The repository also includes environment setup guides and example scripts, such as a simple Flask web app for chat interactions, allowing developers to test OpenAI API integrations locally. Overall, openai-quickstart-python serves as an essential starting point for developers looking to prototype and experiment with OpenAI-powered apps.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    Web Dev for Beginners

    Web Dev for Beginners

    About 24 Lessons, 12 Weeks, Get Started as a Web Developer

    Web-Dev-For-Beginners is Microsoft’s open source, project-based curriculum for learning web development from scratch. Designed as a 12-week, 24-lesson course, it covers HTML, CSS, and JavaScript fundamentals through hands-on projects like terrariums, browser extensions, and space games. Each lesson includes a mix of pre-lecture quizzes, written content, assignments, challenges, and post-lecture quizzes to reinforce learning. The course also offers global accessibility with translations in...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Text Generation Web UI

    Text Generation Web UI

    A gradio web UI for running Large Language Models like LLaMA

    A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS)....
    Downloads: 24 This Week
    Last Update:
    See Project
  • 21
    Snap Lens Web Crawler

    Snap Lens Web Crawler

    Crawl and download Snap Lenses from lens.snapchat.com with ease.

    Crawl and download Snap Lenses from lens.snapchat.com with ease. This crawler is a dependency of Snap Camera Server https://snap-camera-server.sourceforge.io
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Python 100 Days

    Python 100 Days

    Python - From Novice to Master in 100 Days

    Python-100-Days is a comprehensive, practice-first learning roadmap by Luo Hao that spans 100 days from absolute Python basics to professional, production-grade skills. It starts with foundational syntax, control flow, data structures, and functions, then advances through object-oriented programming, file I/O, exceptions, and modules. The middle sections focus on real-world Python applications, including working with CSV, Excel, Word, PowerPoint, PDFs, images, email/SMS, and regular expressions. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    Complete-Python-3-Bootcamp

    Complete-Python-3-Bootcamp

    Course Files for Complete Python 3 Bootcamp Course on Udemy

    ...In addition, it includes applied exercises in areas such as web scraping, working with APIs, and using Python libraries like NumPy, pandas, Matplotlib, and Seaborn for data analysis and visualization. Learners can progress from beginner-friendly basics to more advanced programming skills while reinforcing their knowledge with practice problems and projects. Because it mirrors the course content, this repository is widely used by students taking the Udemy course.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    Zappa - Serverless Python

    Zappa - Serverless Python

    Serverless Python

    Zappa makes it super easy to build and deploy server-less, event-driven Python applications (including, but not limited to, WSGI web apps) on AWS Lambda + API Gateway. Think of it as "serverless" web hosting for your Python apps. That means infinite scaling, zero downtime, zero maintenance - and at a fraction of the cost of your current deployments! With a traditional HTTP server, the server is online 24/7, processing requests one by one as they come in. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    web.py

    web.py

    Web framework for python that is as simple as it is powerful

    web.py is a web framework for Python that is as simple as it is powerful. web.py is in the public domain, you can use it for whatever purpose with absolutely no restrictions. web.py was originally published while Aaron Swartz worked at reddit.com, where the site used it as it grew to become one of the top 1000 sites according to Alexa and served millions of daily page views.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next