Showing 843 open source projects for "extensible web spider"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    xhs-spider

    xhs-spider

    Desktop tool for collecting and exporting Xiaohongshu post data

    XHS-Spider is a desktop data collection tool designed to gather content and metadata from the Xiaohongshu platform. It provides a graphical interface that allows users to explore posts, collect information, and download media such as images and videos from individual notes or search results. It was developed primarily as a learning project to demonstrate approaches to building web crawlers and experimenting with technologies such as WebView2 and WPF UI.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    ...As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    AppFlowy Web

    AppFlowy Web

    Bring projects, wikis, and teams together with AI

    AppFlowy‑Web is the TypeScript/React‑based web frontend of AppFlowy, the open‑source, AI‑powered Notion alternative. Aims to deliver full parity with the desktop app, supporting self‑hosting, collaborative editing, and extensible workspace building.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    The Falcon Web Framework

    The Falcon Web Framework

    The no-nonsense REST API and microservices framework

    Falcon is a minimalist WSGI library for building speedy web APIs and app backends. We like to think of Falcon as the Dieter Rams of web frameworks. When it comes to building HTTP APIs, other frameworks weigh you down with tons of dependencies and unnecessary abstractions. Falcon cuts to the chase with a clean design that embraces HTTP and the REST architectural style. Highly optimized, extensible code base.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 7 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 10
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Elfeed Emacs Web Feed Reader

    Elfeed Emacs Web Feed Reader

    An Emacs web feeds client

    Elfeed is an extensible web feed reader for Emacs, supporting both Atom and RSS. It requires Emacs 24.3 and is available for download from MELPA or el-get.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    ...The API is built on top of urllib3 and lxml libraries. The Spider API to build asynchronous web crawlers. You write classes that define handlers for each type of network request. Each handler is able to spawn new network requests. Network requests are processed concurrently with a pool of asynchronous web sockets. Grab provides interface called Spider to develop multithreaded web-site scrapers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Bot Framework Web Chat

    Bot Framework Web Chat

    A highly-customizable web-based client for Azure Bot Services

    This repository contains code for the Bot Framework Web Chat component. The Bot Framework Web Chat component is a highly-customizable web-based client for the Bot Framework V4 SDK. The Bot Framework SDK v4 enables developers to model conversation and build sophisticated bot applications. This repo is part of the Microsoft Bot Framework, a comprehensive framework for building enterprise-grade conversational AI experiences. Create a bot with the ability to speak, listen, understand, and learn...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    ...Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Node Crawler

    Node Crawler

    Web Crawler/Spider for NodeJS + server-side jQuery

    Most powerful, popular and production crawling/scraping package for Node, happy hacking.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 17

    pico-web-database

    Web spider/database/indexer system programmed in the Pico language

    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Next.js

    Next.js

    The React Framework

    ...It is fully extensible and ready for production. It’s no wonder Next.js is used in tens of thousands of production-facing websites and web applications from some of the world’s biggest brands.
    Downloads: 64 This Week
    Last Update:
    See Project
  • 19
    Caddy

    Caddy

    Powerful, enterprise-ready, open source web server w/ automatic HTTPS

    ...Caddy is the only web server that uses HTTPS automatically and by default. It automatically renews TLS certificates, staples OCSP responses and more. Though used mostly as an HTTPS server, Caddy can be used to run Go applications, offering automated documentation, graceful on-line config changes via API and more to these apps. Caddy is very extensible, with a powerful plugin system unlike any other web server.
    Downloads: 45 This Week
    Last Update:
    See Project
  • 20
    Nikto

    Nikto

    Web server vulnerability scanner for security assessments

    Nikto is an open-source web server scanner that performs comprehensive tests to detect potentially dangerous files, outdated server software, and configuration issues. It’s widely used by penetration testers and security professionals for auditing web applications and infrastructure. Nikto supports multiple output formats and can integrate with other tools for automated scanning workflows.
    Downloads: 82 This Week
    Last Update:
    See Project
  • 21
    Eclipse GLSP

    Eclipse GLSP

    Graphical language server platform for building web-based diagram

    The Graphical Language Server Platform (GLSP) is an extensible open-source framework for building custom diagram editors based on web technologies. Alongside an extensible client framework and a server framework, GLSP provides a language server protocol (LSP) for diagrams. With that, GLSP enables the development of modern, web-based diagram editors, whereas the heavy lifting, such as loading, interpreting, and editing according to the rules of the modeling language, is encapsulated in the server. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Serverless Adapter

    Serverless Adapter

    Run REST APIs and other web applications using existing Node.js app

    Run REST APIs and other web applications using your existing Node.js application framework (NestJS, Express, Koa, Hapi, Fastify and many others), on top of AWS, Azure, Digital Ocean and many other clouds. The library was designed to be very extensible and easy to use. We currently support AWS, Azure, Firebase, Digital Ocean, Google Cloud Functions and Huawei.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    PentestGPT

    PentestGPT

    Automated Penetration Testing Agentic Framework Powered by LLMs

    ...Built with a modular and extensible architecture, PentestGPT supports cloud and local LLMs, making it suitable for research, education, and authorized security testing.
    Downloads: 693 This Week
    Last Update:
    See Project
  • 24
    Flask

    Flask

    The Python micro framework for building web applications

    Flask is a lightweight WSGI web application framework designed to help developers get started with their web applications quickly and easily with the ability to scale up to complex applications. Being a “micro” framework does not mean that your whole web application must fit into a single Python file (although it can) or that it be limited; rather it means that Flask aims to keep the core simple but extensible.
    Downloads: 135 This Week
    Last Update:
    See Project
  • 25
    Luakit

    Luakit

    Fast, small, webkit based browser framework extensible by Lua

    Luakit is a highly configurable browser framework based on the WebKit web content engine and the GTK+ toolkit. It is very fast, extensible with Lua, and licensed under the GNU GPLv3 license. It is primarily targeted at power users, developers and anyone who wants to have fine-grained control over their web browser’s behavior and interface. While switching to the WebKit 2 API means a vastly improved security situation, not all distributions of Linux package the most up-to-date version of WebKitGTK+, and several package very outdated versions that have many known vulnerabilities. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next