373 projects for "extensible web spider" with 1 filter applied:

  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    ...As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 5
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    ...Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Node Crawler

    Node Crawler

    Web Crawler/Spider for NodeJS + server-side jQuery

    Most powerful, popular and production crawling/scraping package for Node, happy hacking.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 8
    PentestGPT

    PentestGPT

    Automated Penetration Testing Agentic Framework Powered by LLMs

    ...Built with a modular and extensible architecture, PentestGPT supports cloud and local LLMs, making it suitable for research, education, and authorized security testing.
    Downloads: 693 This Week
    Last Update:
    See Project
  • 9
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 10
    BrowserGym

    BrowserGym

    A Gym environment for web task automation

    ...One of its main strengths is that it bundles several important benchmarks by default, including MiniWoB, WebArena, VisualWebArena, WorkArena, AssistantBench, WebLINX, and OpenApps. This gives researchers a unified way to compare agent behavior across diverse web environments and task types without stitching together separate evaluation stacks. BrowserGym is also designed to be extensible, and the repository notes that creating new benchmarks mainly involves inheriting its abstract task interface.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Ash

    Ash

    A declarative, extensible framework for building Elixir applications

    Ash is a declarative framework for building resource-oriented apps in Elixir. It emphasizes composability, DSL-driven definitions of resources/actions/relationships, and extensibility through plugins for API, database, and UI layers.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    FOSSBilling

    FOSSBilling

    Empower your hosting business with FOSSBilling

    FOSSBilling is a free and open-source billing and client management solution tailored for online services businesses, particularly those in the web hosting space. It delivers a suite of tools that automate the creation and delivery of invoices, track payments, manage customer accounts, and handle service provisioning, all from a centralized web interface. Because it’s self-hosted and licensed under the Apache 2.0 license, organizations have full control over their data and can customize or extend the system to fit unique workflows or branding requirements. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 14
    Mako

    Mako

    An extremely fast, production-grade web bundler based on Rust

    Mako is a new web bundler for web apps, libraries, and frameworks. It's designed to be fast, reliable, and easy to use. It has been used in hundreds of projects in production by Ant Group, and other companies. If you are looking for a modern web bundler, Mako is the right choice.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Open ChatGPT Atlas

    Open ChatGPT Atlas

    Open Source and Free Alternative to ChatGPT Atlas

    Open ChatGPT Atlas is an open-source toolkit and interface for working with OpenAI’s ChatGPT models in a more extensible, adaptable, and composable way than standard web UIs allow. It provides an architecture where developers and power users can manage state, tool integrations, and multi-turn workflows with more control, enabling custom UIs, automation layers, and advanced routing logic. Unlike a fixed chat app, Atlas is designed as a foundation that can be extended with plugins, external APIs, and custom logic to support domain-specific assistants, agent-like behaviors, and multi-task workflows. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 16
    Django

    Django

    The Web framework for perfectionists with deadlines

    Django is a high-level, free and open-source Python web framework founded on the Model–Template–View (MTV) pattern, designed to facilitate rapid development of secure, maintainable, and scalable database-driven websites. First, read docs/intro/install.txt for instructions on installing Django. Next, work through the tutorials in order (docs/intro/tutorial01.txt, docs/intro/tutorial02.txt, etc.). If you want to set up an actual deployment server, read docs/howto/deployment/index.txt for...
    Downloads: 32 This Week
    Last Update:
    See Project
  • 17
    Flight Core

    Flight Core

    An extensible micro-framework for PHP

    FlightPHP is a lightweight, fast, and flexible micro-framework for PHP, designed to build RESTful web applications and APIs. It provides a simple routing system, middleware support, and a powerful templating engine. FlightPHP is ideal for developers looking for a minimalist framework that doesn't impose a lot of structure, while still offering essential features for building modern web applications.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    TanStack Form

    TanStack Form

    Headless, performant, and type-safe form state management

    TanStack Form is a powerful, headless form management library designed to simplify form handling in web applications. It offers a flexible and extensible API that allows developers to manage form state, validation, and submission with ease. By providing a headless architecture, TanStack Form enables seamless integration with various UI frameworks and custom components, promoting a clean separation between form logic and presentation.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    XRAY

    XRAY

    XRay for recon, mapping and OSINT gathering from public networks

    XRAY is a modular security toolset that helps developers and security professionals analyze, fuzz, and test web applications, protocols, and network services for vulnerabilities. It provides a framework for writing and executing inspection modules that can parse structured data (JSON, XML, HTML), traverse graphs of endpoints, and perform intelligent probing guided by discovered surface area. XRay is typically used as a reconnaissance and vulnerability discovery engine in red-team or app-security workflows: it leverages extensible plugins to adapt to different protocols, inject payloads, and detect common bug classes such as injection flaws, misconfigurations, and unsafe endpoints. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Monaco Editor

    Monaco Editor

    A browser based code editor

    Monaco Editor is the rich, browser-based code editor that powers Visual Studio Code, providing advanced editing capabilities as a standalone embeddable library for web applications. Models are at the heart of Monaco editor. It's what you interact with when managing content. A model represents a file that has been opened. This could represent a file that exists on a file system, but it doesn't have to. For example, the model holds the text content, determines the language of the content, and...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 21
    Grafana

    Grafana

    Leading open-source visualization and observability platform

    Grafana OSS is the leading open-source platform for visualization and observability. It enables teams to query, visualize, alert on, and explore telemetry data from multiple sources in a single interface. With support for 100+ data source plugins—including Prometheus, Loki, Elasticsearch, InfluxDB, SQL/NoSQL databases, and OpenTelemetry—Grafana helps teams correlate metrics, logs, and traces across applications and infrastructure. Users can build interactive dashboards with rich...
    Downloads: 30 This Week
    Last Update:
    See Project
  • 22
    Logbook

    Logbook

    An extensible Java library for HTTP request and response logging

    Logbook is an extensible Java library to enable complete request and response logging for different client- and server-side technologies. It satisfies a special need by a) allowing web application developers to log any HTTP traffic that an application receives or sends b) in a way that makes it easy to persist and analyze it later. This can be useful for traditional log analysis, meeting audit requirements or investigating individual historic traffic issues.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Prism

    Prism

    Lightweight, robust, elegant syntax highlighting

    Prism is a lightweight, extensible syntax highlighter, built with modern web standards in mind. It’s used in millions of websites, including some of those you visit daily.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    xgplayer

    xgplayer

    A HTML5 video player with a parser that saves traffic

    xgplayer is a web-friendly, open-source media player library maintained by ByteDance, designed for playing audio/video streams in browsers or web applications with robust control, flexibility, and extensibility. It abstracts many of the lower-level complexities of HTML5 media, providing a consistent API for playback control, custom UI overlays, adaptive streaming, plugin hooks, and cross-browser compatibility. Because of its emphasis on modularity and extensibility, xgplayer can be embedded...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Markdownify MCP Server

    Markdownify MCP Server

    Convert files and web content into clean, usable Markdown easily

    Markdownify MCP is a Model Context Protocol server that converts many types of files and web content into clean Markdown. It supports formats such as PDFs, images, audio with transcription, DOCX, XLSX, and PPTX, along with web sources like YouTube transcripts, Bing results, and general webpages. Markdownify MCP is designed to simplify content extraction and make data easier to read, share, and reuse in structured workflows. Developers can install dependencies, build, and run the server...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next