Showing 646 open source projects for "extensible web spider"

View related business solutions
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 1
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    ...As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    AppFlowy Web

    AppFlowy Web

    Bring projects, wikis, and teams together with AI

    AppFlowy‑Web is the TypeScript/React‑based web frontend of AppFlowy, the open‑source, AI‑powered Notion alternative. Aims to deliver full parity with the desktop app, supporting self‑hosting, collaborative editing, and extensible workspace building.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    Elfeed Emacs Web Feed Reader

    Elfeed Emacs Web Feed Reader

    An Emacs web feeds client

    Elfeed is an extensible web feed reader for Emacs, supporting both Atom and RSS. It requires Emacs 24.3 and is available for download from MELPA or el-get.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Bot Framework Web Chat

    Bot Framework Web Chat

    A highly-customizable web-based client for Azure Bot Services

    This repository contains code for the Bot Framework Web Chat component. The Bot Framework Web Chat component is a highly-customizable web-based client for the Bot Framework V4 SDK. The Bot Framework SDK v4 enables developers to model conversation and build sophisticated bot applications. This repo is part of the Microsoft Bot Framework, a comprehensive framework for building enterprise-grade conversational AI experiences. Create a bot with the ability to speak, listen, understand, and learn...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 12
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    ...Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Node Crawler

    Node Crawler

    Web Crawler/Spider for NodeJS + server-side jQuery

    Most powerful, popular and production crawling/scraping package for Node, happy hacking.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 14

    pico-web-database

    Web spider/database/indexer system programmed in the Pico language

    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Serverless Adapter

    Serverless Adapter

    Run REST APIs and other web applications using existing Node.js app

    Run REST APIs and other web applications using your existing Node.js application framework (NestJS, Express, Koa, Hapi, Fastify and many others), on top of AWS, Azure, Digital Ocean and many other clouds. The library was designed to be very extensible and easy to use. We currently support AWS, Azure, Firebase, Digital Ocean, Google Cloud Functions and Huawei.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Caddy

    Caddy

    Powerful, enterprise-ready, open source web server w/ automatic HTTPS

    ...Caddy is the only web server that uses HTTPS automatically and by default. It automatically renews TLS certificates, staples OCSP responses and more. Though used mostly as an HTTPS server, Caddy can be used to run Go applications, offering automated documentation, graceful on-line config changes via API and more to these apps. Caddy is very extensible, with a powerful plugin system unlike any other web server.
    Downloads: 59 This Week
    Last Update:
    See Project
  • 17
    Next.js

    Next.js

    The React Framework

    ...It is fully extensible and ready for production. It’s no wonder Next.js is used in tens of thousands of production-facing websites and web applications from some of the world’s biggest brands.
    Downloads: 63 This Week
    Last Update:
    See Project
  • 18
    Eclipse GLSP

    Eclipse GLSP

    Graphical language server platform for building web-based diagram

    The Graphical Language Server Platform (GLSP) is an extensible open-source framework for building custom diagram editors based on web technologies. Alongside an extensible client framework and a server framework, GLSP provides a language server protocol (LSP) for diagrams. With that, GLSP enables the development of modern, web-based diagram editors, whereas the heavy lifting, such as loading, interpreting, and editing according to the rules of the modeling language, is encapsulated in the server. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    Nikto

    Nikto

    Web server vulnerability scanner for security assessments

    Nikto is an open-source web server scanner that performs comprehensive tests to detect potentially dangerous files, outdated server software, and configuration issues. It’s widely used by penetration testers and security professionals for auditing web applications and infrastructure. Nikto supports multiple output formats and can integrate with other tools for automated scanning workflows.
    Downloads: 82 This Week
    Last Update:
    See Project
  • 20
    Flask

    Flask

    The Python micro framework for building web applications

    Flask is a lightweight WSGI web application framework designed to help developers get started with their web applications quickly and easily with the ability to scale up to complex applications. Being a “micro” framework does not mean that your whole web application must fit into a single Python file (although it can) or that it be limited; rather it means that Flask aims to keep the core simple but extensible.
    Downloads: 151 This Week
    Last Update:
    See Project
  • 21
    PentestGPT

    PentestGPT

    Automated Penetration Testing Agentic Framework Powered by LLMs

    ...Built with a modular and extensible architecture, PentestGPT supports cloud and local LLMs, making it suitable for research, education, and authorized security testing.
    Downloads: 606 This Week
    Last Update:
    See Project
  • 22
    BrowserGym

    BrowserGym

    A Gym environment for web task automation

    ...One of its main strengths is that it bundles several important benchmarks by default, including MiniWoB, WebArena, VisualWebArena, WorkArena, AssistantBench, WebLINX, and OpenApps. This gives researchers a unified way to compare agent behavior across diverse web environments and task types without stitching together separate evaluation stacks. BrowserGym is also designed to be extensible, and the repository notes that creating new benchmarks mainly involves inheriting its abstract task interface.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 23
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next