41 projects for "python web crawler" with 2 filters applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 1
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Basic Computer Games

    Basic Computer Games

    An updated version of the classic "Basic Computer Games" book

    Basic Computer Games is a modern revitalization of the classic “Basic Computer Games” book’s collection of games, ported and expanded into various modern, memory-safe and scripting languages. It includes illustrative code examples of many classic games (e.g. Blackjack, Bowling) in multiple languages, with the goal of making the historical games accessible and educational in safe modern environments. Definitely use the most recent versions and features of the target language, but also try to...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    PyMySQL

    PyMySQL

    MySQL client library for Python

    PyMySQL is a 100% Python implementation of the MySQL client protocol, allowing Python applications to connect to MySQL and MariaDB databases without requiring binary extensions. It supports standard DB‑API 2.0 features, such as cursors, transactions, and parameterized queries. PyMySQL is versatile for web applications, scripts, and tools, offering compatibility with ORMs like SQLAlchemy and frameworks like Django.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Deliver secure remote access with OpenVPN. Icon
    Deliver secure remote access with OpenVPN.

    Trusted by nearly 20,000 customers worldwide, and all major cloud providers.

    OpenVPN's products provide scalable, secure remote access — giving complete freedom to your employees to work outside the office while securely accessing SaaS, the internet, and company resources.
    Get started — no credit card required.
  • 5
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    redis-py

    redis-py

    Redis Python client

    redis-py is the official Python client for interacting with Redis, the in-memory data structure store. It supports all Redis commands and data types, making it easy to build caching, messaging, or real-time analytics features in Python applications. With both synchronous and asyncio support, redis-py is suited for modern Python projects and integrates smoothly into web frameworks, task queues, and backend services.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Helium

    Helium

    Lighter web automation with Python

    Helium is a Python library built on top of Selenium to make browser automation more intuitive and human-friendly. It replaces verbose boilerplate code with natural language-like API calls such as click("Login") or write("hello", into="Name"). Helium manages browser setup, waits, and teardown, enabling quick development of scripts for testing, scraping, or task automation without requiring deep Selenium knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Tortoise ORM

    Tortoise ORM

    Familiar asyncio ORM for python, built with relations in mind

    Tortoise ORM is an easy-to-use asyncio ORM (Object Relational Mapper) for Python, inspired by Django's ORM. It is designed to work with asynchronous frameworks, providing a simple and familiar API for interacting with databases. Tortoise ORM supports various relational databases and is suitable for building high-performance web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • No-Nonsense Code-to-Cloud Security for Devs | Aikido Icon
    No-Nonsense Code-to-Cloud Security for Devs | Aikido

    Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

    Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.
    Start for Free
  • 10
    WTForms

    WTForms

    A flexible forms validation and rendering library for Python

    WTForms is a flexible forms validation and rendering library for Python web development. It can work with whatever web framework and template engine you choose. It supports data validation, CSRF protection, internationalization (I18N), and more. There are various community libraries that provide closer integration with popular frameworks. WTForms is designed to work with any web framework and template engine. There are a number of community-provided libraries that make integrating...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Awesome Free ChatGPT

    Awesome Free ChatGPT

    List of free ChatGPT mirror sites, continuously updated

    This is a curated directory of freely accessible ChatGPT-style services and mirror sites that offer AI chatbot interfaces without login or payment requirements. Resources often support multiple models like GPT-4, Claude, Gemini, and more. Data collected from multiple independent sites with descriptions and tags. Includes services with image upload and drawing capabilities. Aggregates free, no-login-required ChatGPT-like web services. Continually updated mirror list to maintain availability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    notebooker

    notebooker

    Productionise & schedule your Jupyter Notebooks

    Productionise and schedule your Jupyter Notebooks, just as interactively as you wrote them. Notebooker is a webapp which can execute and parametrise Jupyter Notebooks as soon as they have been committed to git. The results are stored in MongoDB and searchable via the web interface, essentially turning your Jupyter Notebook into a production-style web-based report in a few clicks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    BentoCache

    BentoCache

    Bentocache is a robust multi-tier caching library for Node.js app

    Bentocache is a flexible caching library for Python that supports multiple backends like memory, disk, and Redis. It offers decorators for easy function-level caching and is designed to be lightweight, extensible, and developer-friendly. Bentocache is well-suited for performance optimization in web apps, scripts, and data pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Papis

    Papis

    Powerful and highly extensible command-line based document

    Papis is a powerful and highly extensible CLI document and bibliography manager. With Papis, you can search your library for books and papers, add documents and notes, import and export to and from other formats, and much much more. Papis uses a human-readable and easily hackable .yaml file to store each entry's bibliographical data. It strives to be easy to use while providing a wide range of features. And for those who still want more, Papis makes it easy to write scripts that extend its...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Siddhi Core Libraries

    Siddhi Core Libraries

    Stream Processing and Complex Event Processing Engine

    ... to various endpoints in real time. Agile development experience with SQL-like query language and graphical drag-and-drop editor supporting event simulation. Lightweight runtime that can natively run on Kubernetes, Docker, VM, or bare metal, and embedded in any Java or Python application. Scalable, and highly available distributed event processing on Kubernetes, with NATS Streaming and Siddhi Kubernetes Operator.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Requests for PHP

    Requests for PHP

    Requests for PHP is a humble HTTP request library

    Requests is a HTTP library written in PHP, for human beings. It is roughly based on the API from the excellent Requests Python library. Requests is ISC Licensed (similar to the new BSD license) and has no dependencies, except for PHP 5.6+. Despite PHP’s use as a language for the web, its tools for sending HTTP requests are severely lacking. cURL has an interesting API, to say the least, and you can’t always rely on it being available. Sockets provide only low-level access and require you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Lexbor

    Lexbor

    Lexbor is development of an open source HTML Renderer library

    Lexbor is the development of a web browser engine available as a software library; it ships with a free license and has no extra dependencies. For us, speed is an absolute must-have. In our development process, we focus on fastest parsing techniques for HTML, CSS, and fonts, fastest data processing methods, and fastest ways to serve content to end users. Whether you are building a backend that handles millions of HTML documents or a UI-heavy user app, your software’s response rate always...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Whisper Library

    Whisper Library

    Whisper is a file-based time-series database format for Graphite

    Whisper is one of three components within the Graphite project. Whisper is a fixed-size database, similar in design and purpose to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time. Whisper allows for higher resolution (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data. Copies data from src in dst, if missing. Unlike whisper-merge, don't overwrite data that's already present in the target...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Spyne

    Spyne

    A transport agnostic sync/async RPC library

    Spyne is a Python RPC toolkit that makes it easy to expose online services that have a well-defined API using multiple protocols and transports. It integrates with popular Python web frameworks as well as libraries like SQLAlchemy to keep your code as DRY as possible. Spyne aims to save the protocol implementers the hassle of implementing their own remote procedure call api and the application programmers the hassle of jumping through hoops just to expose their services using multiple protocols...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Hack-Tools

    Hack-Tools

    Hack tools

    hack-tools is a collection of various hacking tools and utilities. It serves as a comprehensive toolkit for penetration testers and cybersecurity enthusiasts, encompassing a wide range of functionalities.​
    Downloads: 11 This Week
    Last Update:
    See Project
  • 21
    Functional, Data Science Intro To Python

    Functional, Data Science Intro To Python

    [tutorial]A functional, Data Science focused introduction to Python

    The first section is an intentionally brief, functional, data science-centric introduction to Python. The assumption is a someone with zero experience in programming can follow this tutorial and learn Python with the smallest amount of information possible. The sections after that, involve varying levels of difficulty and cover topics as diverse as Machine Learning, Linear Optimization, build systems, command line tools, recommendation engines, Sentiment Analysis and Cloud Computing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Assorted projects. General-purpose libraries for Python, C++, Scala, bash, and others. Meta-programming tools. System utilities. UI components. Web APIs. Configuration files. Benchmarks. Programming competition entries. And much more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Icon Font to PNG

    Icon Font to PNG

    Python script (and library) for exporting icons from icon fonts

    Python script (and library) for easy and simple export of icons from web icon fonts (e.g. Font Awesome, Octicons) as PNG images. The best part is the provided shell script, but you can also use it’s functionality directly in your (probably awesome) Python project. There’s also font-awesome-to-png script for backward compatibility with the first iteration of the concept. You can use IconFont (and IconFontDownloader for that matter) directly inside your Python project. There's no proper...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    C++ Standard Airline IT Object Library
    That project aims at providing a clean API, and the corresponding C++ implementation, for the basis of Airline IT Business Object Model (BOM), ie, to be used by several other Open Source projects, such as RMOL, Air-Sched, Travel-CCM, OpenTREP, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Node Crawler

    Node Crawler

    Web Crawler/Spider for NodeJS + server-side jQuery

    Most powerful, popular and production crawling/scraping package for Node, happy hacking.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.