Showing 295 open source projects for "extensible web spider"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 5
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    ...The API is built on top of urllib3 and lxml libraries. The Spider API to build asynchronous web crawlers. You write classes that define handlers for each type of network request. Each handler is able to spawn new network requests. Network requests are processed concurrently with a pool of asynchronous web sockets. Grab provides interface called Spider to develop multithreaded web-site scrapers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    ...Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.
    Downloads: 2 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 10
    Caddy

    Caddy

    Powerful, enterprise-ready, open source web server w/ automatic HTTPS

    ...Caddy is the only web server that uses HTTPS automatically and by default. It automatically renews TLS certificates, staples OCSP responses and more. Though used mostly as an HTTPS server, Caddy can be used to run Go applications, offering automated documentation, graceful on-line config changes via API and more to these apps. Caddy is very extensible, with a powerful plugin system unlike any other web server.
    Downloads: 59 This Week
    Last Update:
    See Project
  • 11
    Luakit

    Luakit

    Fast, small, webkit based browser framework extensible by Lua

    Luakit is a highly configurable browser framework based on the WebKit web content engine and the GTK+ toolkit. It is very fast, extensible with Lua, and licensed under the GNU GPLv3 license. It is primarily targeted at power users, developers and anyone who wants to have fine-grained control over their web browser’s behavior and interface. While switching to the WebKit 2 API means a vastly improved security situation, not all distributions of Linux package the most up-to-date version of WebKitGTK+, and several package very outdated versions that have many known vulnerabilities. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 12
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    Certbot

    Certbot

    Get free HTTPS certificates forever from Let's Encrypt

    Certbot is a fully-featured, easy-to-use, extensible client for the Let's Encrypt CA. It fetches a digital certificate from Let’s Encrypt, an open certificate authority launched by the EFF, Mozilla, and others. This certificate then lets browsers verify the identity of web servers and ensures secure communication over the Web. Obtaining and maintaining a certificate is usually such a hassle, but with Certbot and Let’s Encrypt it becomes automated and hassle-free. ...
    Downloads: 90 This Week
    Last Update:
    See Project
  • 15
    Lighthouse

    Lighthouse

    Automated auditing, performance metrics, & best practices for the web

    Lighthouse is an open-source, automated tool that analyzes and audits web apps and web pages in order to improve their quality. Lighthouse collects modern performance metrics and insights on developer best practices; auditing for performance, accessibility, SEO and more. After auditing it produces a report either in JSON or HTML. Included in the report is a reference doc that explains the importance of the audit and how to fix the problem areas, which you can use to improve the web app or...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 16
    Open ChatGPT Atlas

    Open ChatGPT Atlas

    Open Source and Free Alternative to ChatGPT Atlas

    Open ChatGPT Atlas is an open-source toolkit and interface for working with OpenAI’s ChatGPT models in a more extensible, adaptable, and composable way than standard web UIs allow. It provides an architecture where developers and power users can manage state, tool integrations, and multi-turn workflows with more control, enabling custom UIs, automation layers, and advanced routing logic. Unlike a fixed chat app, Atlas is designed as a foundation that can be extended with plugins, external APIs, and custom logic to support domain-specific assistants, agent-like behaviors, and multi-task workflows. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 17
    Puter

    Puter

    🌐 The Internet Computer! Free, Open-Source, and Self-Hostable

    ...The platform allows developers to build, publish, and monetize web applications through its integrated App Store ecosystem. Its self-hosting capabilities give organizations and individuals full control over their data, applications, and deployment environments. With a highly extensible architecture and broad community support, Puter serves as a modern alternative to traditional cloud desktops and personal cloud platforms.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Strapi

    Strapi

    API creation made simple, secure and fast

    Strapi is the most advanced open-source headless CMS for creating powerful and customizable APIs with no effort. Built with 100% JavaScript, Strapi lets you easily create self-hosted, customizable, and performant content APIs. Strapi projects can be hosted on any platform of your choice, and you can work with any database you prefer. All your favorite dev tools-- from static site generators and databases to hosting platforms work with Strapi, so you're never locked in. Strapi is designed...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 19
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 20
    Rybbit

    Rybbit

    Open-source and privacy-friendly alternative to Google Analytics

    Rybbit is an open-source, privacy-friendly web and product analytics platform positioned as a modern alternative to heavyweight tracking suites. It focuses on a clean UI and intuitive metrics so non-analysts can answer everyday questions without wading through dozens of reports. The data model is event-driven, enabling funnels, retention views, and simple user journeys without complex configuration. Because privacy is a first-class goal, it aims to minimize personal data collection and...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    FastTunnel

    FastTunnel

    Expose a local server to the internet

    FastTunnel is a high-performance cross-platform intranet penetration tool. With it, you can expose intranet services to the public network for yourself or anyone to access. Unlike other penetration tools, the FastTunnel project is committed to creating an easy-to-extensible and easy-to-maintain intranet penetration framework. You can build your own penetration application by referencing the nuget package of FastTunnel.Core, and target the business extension functions you need.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Eclipse Jetty Canonical Repository

    Eclipse Jetty Canonical Repository

    Eclipse Jetty - Web Container & Clients - supports HTTP/2, HTTP

    Jetty provides a web server and servlet container, additionally providing support for HTTP/2, WebSocket, OSGi, JMX, JNDI, JAAS and many other integrations. These components are open source and are freely available for commercial use and distribution. Jetty is used in a wide variety of projects and products, both in development and production. Jetty has long been loved by developers due to its long history of being easily embedded in devices, tools, frameworks, application servers, and modern...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Silverstripe CMS

    Silverstripe CMS

    Silverstripe CMS - this is a module for Silverstripe Framework

    Silverstripe CMS is an intuitive content management system and flexible framework loved by editors and developers alike. Equip your web teams to achieve outstanding results. Silverstripe CMS fits the outcomes you want and doesn't force your business outcomes into an out-of-the-box solution. Customize to your needs. You can be the CMS expert in no time. Get started quickly and deliver your content to your users fast. Don’t stay awake at night worrying! Silverstripe CMS is solid as a rock,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    Guzzle

    Guzzle

    An extensible PHP HTTP client

    Guzzle is a PHP HTTP client that makes it easy to send HTTP requests and trivial to integrate with web services. Simple interface for building query strings, POST requests, streaming large uploads, streaming large downloads, using HTTP cookies, uploading JSON data, etc... Can send both synchronous and asynchronous requests using the same interface. Uses PSR-7 interfaces for requests, responses, and streams. This allows you to utilize other PSR-7 compatible libraries with Guzzle. Abstracts...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next