468 projects for "extensible web spider" with 1 filter applied:

  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 1
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    ...As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 5
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    ...Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Node Crawler

    Node Crawler

    Web Crawler/Spider for NodeJS + server-side jQuery

    Most powerful, popular and production crawling/scraping package for Node, happy hacking.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 8
    Caddy

    Caddy

    Powerful, enterprise-ready, open source web server w/ automatic HTTPS

    ...Caddy is the only web server that uses HTTPS automatically and by default. It automatically renews TLS certificates, staples OCSP responses and more. Though used mostly as an HTTPS server, Caddy can be used to run Go applications, offering automated documentation, graceful on-line config changes via API and more to these apps. Caddy is very extensible, with a powerful plugin system unlike any other web server.
    Downloads: 60 This Week
    Last Update:
    See Project
  • 9
    PentestGPT

    PentestGPT

    Automated Penetration Testing Agentic Framework Powered by LLMs

    ...Built with a modular and extensible architecture, PentestGPT supports cloud and local LLMs, making it suitable for research, education, and authorized security testing.
    Downloads: 540 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    BrowserGym

    BrowserGym

    A Gym environment for web task automation

    ...One of its main strengths is that it bundles several important benchmarks by default, including MiniWoB, WebArena, VisualWebArena, WorkArena, AssistantBench, WebLINX, and OpenApps. This gives researchers a unified way to compare agent behavior across diverse web environments and task types without stitching together separate evaluation stacks. BrowserGym is also designed to be extensible, and the repository notes that creating new benchmarks mainly involves inheriting its abstract task interface.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 11
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    Ash

    Ash

    A declarative, extensible framework for building Elixir applications

    Ash is a declarative framework for building resource-oriented apps in Elixir. It emphasizes composability, DSL-driven definitions of resources/actions/relationships, and extensibility through plugins for API, database, and UI layers.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    FOSSBilling

    FOSSBilling

    Empower your hosting business with FOSSBilling

    FOSSBilling is a free and open-source billing and client management solution tailored for online services businesses, particularly those in the web hosting space. It delivers a suite of tools that automate the creation and delivery of invoices, track payments, manage customer accounts, and handle service provisioning, all from a centralized web interface. Because it’s self-hosted and licensed under the Apache 2.0 license, organizations have full control over their data and can customize or extend the system to fit unique workflows or branding requirements. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    Mako

    Mako

    An extremely fast, production-grade web bundler based on Rust

    Mako is a new web bundler for web apps, libraries, and frameworks. It's designed to be fast, reliable, and easy to use. It has been used in hundreds of projects in production by Ant Group, and other companies. If you are looking for a modern web bundler, Mako is the right choice.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    Open ChatGPT Atlas

    Open ChatGPT Atlas

    Open Source and Free Alternative to ChatGPT Atlas

    Open ChatGPT Atlas is an open-source toolkit and interface for working with OpenAI’s ChatGPT models in a more extensible, adaptable, and composable way than standard web UIs allow. It provides an architecture where developers and power users can manage state, tool integrations, and multi-turn workflows with more control, enabling custom UIs, automation layers, and advanced routing logic. Unlike a fixed chat app, Atlas is designed as a foundation that can be extended with plugins, external APIs, and custom logic to support domain-specific assistants, agent-like behaviors, and multi-task workflows. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 17
    Logbook

    Logbook

    An extensible Java library for HTTP request and response logging

    Logbook is an extensible Java library to enable complete request and response logging for different client- and server-side technologies. It satisfies a special need by a) allowing web application developers to log any HTTP traffic that an application receives or sends b) in a way that makes it easy to persist and analyze it later. This can be useful for traditional log analysis, meeting audit requirements or investigating individual historic traffic issues.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Flight Core

    Flight Core

    An extensible micro-framework for PHP

    FlightPHP is a lightweight, fast, and flexible micro-framework for PHP, designed to build RESTful web applications and APIs. It provides a simple routing system, middleware support, and a powerful templating engine. FlightPHP is ideal for developers looking for a minimalist framework that doesn't impose a lot of structure, while still offering essential features for building modern web applications.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19
    XRAY

    XRAY

    XRay for recon, mapping and OSINT gathering from public networks

    XRAY is a modular security toolset that helps developers and security professionals analyze, fuzz, and test web applications, protocols, and network services for vulnerabilities. It provides a framework for writing and executing inspection modules that can parse structured data (JSON, XML, HTML), traverse graphs of endpoints, and perform intelligent probing guided by discovered surface area. XRay is typically used as a reconnaissance and vulnerability discovery engine in red-team or app-security workflows: it leverages extensible plugins to adapt to different protocols, inject payloads, and detect common bug classes such as injection flaws, misconfigurations, and unsafe endpoints. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Prism

    Prism

    Lightweight, robust, elegant syntax highlighting

    Prism is a lightweight, extensible syntax highlighter, built with modern web standards in mind. It’s used in millions of websites, including some of those you visit daily.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 21
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 22
    Monaco Editor

    Monaco Editor

    A browser based code editor

    Monaco Editor is the rich, browser-based code editor that powers Visual Studio Code, providing advanced editing capabilities as a standalone embeddable library for web applications. Models are at the heart of Monaco editor. It's what you interact with when managing content. A model represents a file that has been opened. This could represent a file that exists on a file system, but it doesn't have to. For example, the model holds the text content, determines the language of the content, and...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 23
    Django

    Django

    The Web framework for perfectionists with deadlines

    Django is a high-level, free and open-source Python web framework founded on the Model–Template–View (MTV) pattern, designed to facilitate rapid development of secure, maintainable, and scalable database-driven websites. First, read docs/intro/install.txt for instructions on installing Django. Next, work through the tutorials in order (docs/intro/tutorial01.txt, docs/intro/tutorial02.txt, etc.). If you want to set up an actual deployment server, read docs/howto/deployment/index.txt for...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 24
    Behat

    Behat

    BDD in PHP

    Behat is a Behavior-Driven Development (BDD) framework for PHP that helps developers write tests in a human-readable format. It uses Gherkin syntax to describe expected application behavior and allows developers to write scenarios that map to automated tests. Behat is highly extensible, making it suitable for testing both web applications and APIs, and it is often used alongside Mink for browser automation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Markdownify MCP Server

    Markdownify MCP Server

    Convert files and web content into clean, usable Markdown easily

    Markdownify MCP is a Model Context Protocol server that converts many types of files and web content into clean Markdown. It supports formats such as PDFs, images, audio with transcription, DOCX, XLSX, and PPTX, along with web sources like YouTube transcripts, Bing results, and general webpages. Markdownify MCP is designed to simplify content extraction and make data easier to read, share, and reuse in structured workflows. Developers can install dependencies, build, and run the server...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo