Showing 70 open source projects for "python web crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • 1
    Best-of Web Development with Python

    Best-of Web Development with Python

    A ranked list of awesome python libraries for web development

    This curated list contains 570 awesome open-source projects with a total of 2.4M stars grouped into 26 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from Github and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! A ranked list of awesome python libraries for web development...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    This curated list contains 390 awesome open-source projects with a total of 1.4M stars grouped into 28 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! Ranked list of awesome python libraries for web development...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    Selenium-python Helium

    Selenium-python Helium

    Selenium-python but lighter: Helium is the best Python library

    Under the hood, Helium forwards each call to Selenium. The difference is that Helium's API is much more high-level. In Selenium, you need to use HTML IDs, XPaths and CSS selectors to identify web page elements. Helium on the other hand lets you refer to elements by user-visible labels. As a result, Helium scripts are typically 30-50% shorter than similar Selenium scripts. What's more, they are easier to read and more stable with respect to changes in the underlying web page. Selenium-python...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Powering the best of the internet | Fastly Icon
    Powering the best of the internet | Fastly

    Fastly's edge cloud platform delivers faster, safer, and more scalable sites and apps to customers.

    Ensure your websites, applications and services can effortlessly handle the demands of your users with Fastly. Fastly’s portfolio is designed to be highly performant, personalized and secure while seamlessly scaling to support your growth.
    Try for free
  • 5
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    WeasyPrint

    WeasyPrint

    The awesome document factory

    WeasyPrint is a smart solution helping people to create PDF documents. You can generate gorgeous statistical reports, invoices, tickets, and anything you want as long as you have some webdesign skills! Design your documents just as you design your websites! WeasyPrint follows the widely used HTML and CSS specifications from the W3C. You can use your usual web tools, languages and frameworks, but for print. Creating high-quality digital documents requires features that you love to use as readers...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 7
    QR Code generator library

    QR Code generator library

    High-quality QR Code generator library in Java, TypeScript/JavaScript

    ... to TypeScript, Python, Rust, C++, and C. It is open source under the MIT License. For each language, the codebase is roughly 1000 lines of code and has no dependencies other than the respective language’s standard library.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 8
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    jQuery Terminal

    jQuery Terminal

    JavaScript library for creating web-based terminals

    jQuery Terminal is a JavaScript library for creating command-line interpreters in your applications. You can use this JavaScript Terminal library to create interactive web-based terminal applications on your website. Where commands are defined by you. You can define them on the server or in the browser's JavaScript. It can automatically call JSON-RPC service when the user types a command. Alternatively, you can provide an object with methods; each method will be invoked on the user's command...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Metaflow

    Metaflow

    A framework for real-life data science

    Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    PyMySQL

    PyMySQL

    MySQL client library for Python

    PyMySQL is a 100% Python implementation of the MySQL client protocol, allowing Python applications to connect to MySQL and MariaDB databases without requiring binary extensions. It supports standard DB‑API 2.0 features, such as cursors, transactions, and parameterized queries. PyMySQL is versatile for web applications, scripts, and tools, offering compatibility with ORMs like SQLAlchemy and frameworks like Django.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    BentoCache

    BentoCache

    Bentocache is a robust multi-tier caching library for Node.js app

    Bentocache is a flexible caching library for Python that supports multiple backends like memory, disk, and Redis. It offers decorators for easy function-level caching and is designed to be lightweight, extensible, and developer-friendly. Bentocache is well-suited for performance optimization in web apps, scripts, and data pipelines.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    ..., and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of data sources. SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. For production-grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Playwright for .NET

    Playwright for .NET

    .NET version of the Playwright testing and automation library

    ..., JavaScript, Python, .NET, Java. Test Mobile Web. Native mobile emulation of Google Chrome for Android and Mobile Safari. The same rendering engine works on your Desktop and in the Cloud. Auto-wait. Playwright waits for elements to be actionable prior to performing actions. It also has a rich set of introspection events. The combination of the two eliminates the need for artificial timeouts - the primary cause of flaky tests.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Requests for PHP

    Requests for PHP

    Requests for PHP is a humble HTTP request library

    Requests is a HTTP library written in PHP, for human beings. It is roughly based on the API from the excellent Requests Python library. Requests is ISC Licensed (similar to the new BSD license) and has no dependencies, except for PHP 5.6+. Despite PHP’s use as a language for the web, its tools for sending HTTP requests are severely lacking. cURL has an interesting API, to say the least, and you can’t always rely on it being available. Sockets provide only low-level access and require you...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Lexbor

    Lexbor

    Lexbor is development of an open source HTML Renderer library

    Lexbor is the development of a web browser engine available as a software library; it ships with a free license and has no extra dependencies. For us, speed is an absolute must-have. In our development process, we focus on fastest parsing techniques for HTML, CSS, and fonts, fastest data processing methods, and fastest ways to serve content to end users. Whether you are building a backend that handles millions of HTML documents or a UI-heavy user app, your software’s response rate always...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Venom

    Venom

    Venom is the most complete javascript library for Whatsapp

    Venom is a high-performance system developed with JavaScript to create a bot for WhatsApp, support for creating any interaction, such as customer service, media sending, sentence recognition based on artificial intelligence and all types of design architecture for WhatsApp. It's a high-performance alternative API to whatzapp, you can send, text messages, files, images, videos and more. Remember, the API was developed on a platform called RESTful Web services, providing interoperability between...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    redis-py

    redis-py

    Redis Python client

    redis-py is the official Python client for interacting with Redis, the in-memory data structure store. It supports all Redis commands and data types, making it easy to build caching, messaging, or real-time analytics features in Python applications. With both synchronous and asyncio support, redis-py is suited for modern Python projects and integrates smoothly into web frameworks, task queues, and backend services.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Werkzeug

    Werkzeug

    The comprehensive WSGI web application library

    Werkzeug is a comprehensive WSGI web application library. It began as a simple collection of various utilities for WSGI applications and has become one of the most advanced WSGI utility libraries. Werkzeug doesn’t enforce any dependencies. It is up to the developer to choose a template engine, database adapter, and even how to handle requests. Includes an interactive debugger that allows inspecting stack traces and source code in the browser with an interactive interpreter for any frame...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Helium

    Helium

    Lighter web automation with Python

    Helium is a Python library built on top of Selenium to make browser automation more intuitive and human-friendly. It replaces verbose boilerplate code with natural language-like API calls such as click("Login") or write("hello", into="Name"). Helium manages browser setup, waits, and teardown, enabling quick development of scripts for testing, scraping, or task automation without requiring deep Selenium knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Tortoise ORM

    Tortoise ORM

    Familiar asyncio ORM for python, built with relations in mind

    Tortoise ORM is an easy-to-use asyncio ORM (Object Relational Mapper) for Python, inspired by Django's ORM. It is designed to work with asynchronous frameworks, providing a simple and familiar API for interacting with databases. Tortoise ORM supports various relational databases and is suitable for building high-performance web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    WTForms

    WTForms

    A flexible forms validation and rendering library for Python

    WTForms is a flexible forms validation and rendering library for Python web development. It can work with whatever web framework and template engine you choose. It supports data validation, CSRF protection, internationalization (I18N), and more. There are various community libraries that provide closer integration with popular frameworks. WTForms is designed to work with any web framework and template engine. There are a number of community-provided libraries that make integrating...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    notebooker

    notebooker

    Productionise & schedule your Jupyter Notebooks

    Productionise and schedule your Jupyter Notebooks, just as interactively as you wrote them. Notebooker is a webapp which can execute and parametrise Jupyter Notebooks as soon as they have been committed to git. The results are stored in MongoDB and searchable via the web interface, essentially turning your Jupyter Notebook into a production-style web-based report in a few clicks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    django-viewflow

    django-viewflow

    Reusable workflow library for Django

    Viewflow is a lightweight reusable workflow library that helps to organize people collaboration business logic in Django applications. In conjunction with Django-material, they could be used as the framework to build ready-to-use business applications in minutes. Django web framework solves only technical problems related to the client-server interaction on top of the stateless HTTP protocol. Model-View-Template separation pattern helps to maintain simple CRUD-based logic. Viewflow...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.