Showing 241 open source projects for "extensible web spider"

View related business solutions
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 1
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    ...Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Caddy

    Caddy

    Powerful, enterprise-ready, open source web server w/ automatic HTTPS

    ...Caddy is the only web server that uses HTTPS automatically and by default. It automatically renews TLS certificates, staples OCSP responses and more. Though used mostly as an HTTPS server, Caddy can be used to run Go applications, offering automated documentation, graceful on-line config changes via API and more to these apps. Caddy is very extensible, with a powerful plugin system unlike any other web server.
    Downloads: 60 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Certbot

    Certbot

    Get free HTTPS certificates forever from Let's Encrypt

    Certbot is a fully-featured, easy-to-use, extensible client for the Let's Encrypt CA. It fetches a digital certificate from Let’s Encrypt, an open certificate authority launched by the EFF, Mozilla, and others. This certificate then lets browsers verify the identity of web servers and ensures secure communication over the Web. Obtaining and maintaining a certificate is usually such a hassle, but with Certbot and Let’s Encrypt it becomes automated and hassle-free. ...
    Downloads: 99 This Week
    Last Update:
    See Project
  • 13
    Lighthouse

    Lighthouse

    Automated auditing, performance metrics, & best practices for the web

    Lighthouse is an open-source, automated tool that analyzes and audits web apps and web pages in order to improve their quality. Lighthouse collects modern performance metrics and insights on developer best practices; auditing for performance, accessibility, SEO and more. After auditing it produces a report either in JSON or HTML. Included in the report is a reference doc that explains the importance of the audit and how to fix the problem areas, which you can use to improve the web app or...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 14
    Open ChatGPT Atlas

    Open ChatGPT Atlas

    Open Source and Free Alternative to ChatGPT Atlas

    Open ChatGPT Atlas is an open-source toolkit and interface for working with OpenAI’s ChatGPT models in a more extensible, adaptable, and composable way than standard web UIs allow. It provides an architecture where developers and power users can manage state, tool integrations, and multi-turn workflows with more control, enabling custom UIs, automation layers, and advanced routing logic. Unlike a fixed chat app, Atlas is designed as a foundation that can be extended with plugins, external APIs, and custom logic to support domain-specific assistants, agent-like behaviors, and multi-task workflows. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 15
    Puter

    Puter

    🌐 The Internet Computer! Free, Open-Source, and Self-Hostable

    ...The platform allows developers to build, publish, and monetize web applications through its integrated App Store ecosystem. Its self-hosting capabilities give organizations and individuals full control over their data, applications, and deployment environments. With a highly extensible architecture and broad community support, Puter serves as a modern alternative to traditional cloud desktops and personal cloud platforms.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    Strapi

    Strapi

    API creation made simple, secure and fast

    Strapi is the most advanced open-source headless CMS for creating powerful and customizable APIs with no effort. Built with 100% JavaScript, Strapi lets you easily create self-hosted, customizable, and performant content APIs. Strapi projects can be hosted on any platform of your choice, and you can work with any database you prefer. All your favorite dev tools-- from static site generators and databases to hosting platforms work with Strapi, so you're never locked in. Strapi is designed...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 17
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 18
    Rybbit

    Rybbit

    Open-source and privacy-friendly alternative to Google Analytics

    Rybbit is an open-source, privacy-friendly web and product analytics platform positioned as a modern alternative to heavyweight tracking suites. It focuses on a clean UI and intuitive metrics so non-analysts can answer everyday questions without wading through dozens of reports. The data model is event-driven, enabling funnels, retention views, and simple user journeys without complex configuration. Because privacy is a first-class goal, it aims to minimize personal data collection and...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    FastTunnel

    FastTunnel

    Expose a local server to the internet

    FastTunnel is a high-performance cross-platform intranet penetration tool. With it, you can expose intranet services to the public network for yourself or anyone to access. Unlike other penetration tools, the FastTunnel project is committed to creating an easy-to-extensible and easy-to-maintain intranet penetration framework. You can build your own penetration application by referencing the nuget package of FastTunnel.Core, and target the business extension functions you need.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 20
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Eclipse Jetty Canonical Repository

    Eclipse Jetty Canonical Repository

    Eclipse Jetty - Web Container & Clients - supports HTTP/2, HTTP

    Jetty provides a web server and servlet container, additionally providing support for HTTP/2, WebSocket, OSGi, JMX, JNDI, JAAS and many other integrations. These components are open source and are freely available for commercial use and distribution. Jetty is used in a wide variety of projects and products, both in development and production. Jetty has long been loved by developers due to its long history of being easily embedded in devices, tools, frameworks, application servers, and modern...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Silverstripe CMS

    Silverstripe CMS

    Silverstripe CMS - this is a module for Silverstripe Framework

    Silverstripe CMS is an intuitive content management system and flexible framework loved by editors and developers alike. Equip your web teams to achieve outstanding results. Silverstripe CMS fits the outcomes you want and doesn't force your business outcomes into an out-of-the-box solution. Customize to your needs. You can be the CMS expert in no time. Get started quickly and deliver your content to your users fast. Don’t stay awake at night worrying! Silverstripe CMS is solid as a rock,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Melt UI

    Melt UI

    A set of headless, accessible component builders for Svelte

    Melt UI is an open-source headless component builder library created specifically for the Svelte ecosystem, designed to help developers construct accessible and highly customizable user interface components. Rather than providing fully styled widgets, the library focuses on supplying the behavioral logic and accessibility patterns needed to build UI components while allowing developers to control the visual appearance. Melt UI introduces a builder-based architecture where component logic is...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    Guzzle

    Guzzle

    An extensible PHP HTTP client

    Guzzle is a PHP HTTP client that makes it easy to send HTTP requests and trivial to integrate with web services. Simple interface for building query strings, POST requests, streaming large uploads, streaming large downloads, using HTTP cookies, uploading JSON data, etc... Can send both synchronous and asynchronous requests using the same interface. Uses PSR-7 interfaces for requests, responses, and streams. This allows you to utilize other PSR-7 compatible libraries with Guzzle. Abstracts...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Mezzanine

    Mezzanine

    CMS framework for Django

    Mezzanine is a powerful open source content management platform built using the Django framework. In many ways it is like many other content management tools, offering an intuitive interface for managing all of your content. But Mezzanine is different in that it provides most of its functionality by default. While other platforms rely heavily on modules or reusable applications, Mezzanine comes ready with all the functionality you need, making it the more efficient choice. Mezzanine has a...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo