Showing 26 open source projects for "scraping"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 1
    LLM Scraper

    LLM Scraper

    Extract structured data from webpages using LLM-powered scraping

    ...Multiple content processing modes are supported, including raw HTML, cleaned HTML, Markdown, extracted text, screenshots, and custom inputs, making it adaptable to a wide range of scraping scenarios. LLM Scraper also provides streaming output and code generation capabilities that help developers build reusable scraping workflows.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Python Code Tutorials

    Python Code Tutorials

    The Python Code Tutorials

    Python Code Tutorials is a large educational repository that aggregates programming tutorials from the “The Python Code” website into a structured collection of Python projects and learning materials. The repository covers a wide range of programming topics including cybersecurity, networking, web scraping, machine learning, GUI development, and automation scripts. Each tutorial typically includes complete Python code examples and explanations that demonstrate how to build real tools and applications step by step. Many tutorials focus on practical implementations such as building network scanners, web scraping tools, object detection systems, and automation utilities using Python libraries. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Actors MCP Server

    Actors MCP Server

    Model Context Protocol (MCP) Server for Apify's Actors

    The Apify Actors MCP Server is a Model Context Protocol (MCP) server that enables AI assistants to interact with Apify Actors. This integration allows AI models to utilize various web scraping and automation tools provided by Apify, facilitating tasks such as data extraction and web automation. ​
    Downloads: 2 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    Firecrawl MCP Server

    Firecrawl MCP Server

    Adds powerful web scraping and search to Cursor and Claude

    firecrawl-mcp-server is the official MCP integration for Firecrawl that brings high-recall web scraping, crawling, and search into IDEs and agent runtimes. It exposes tools for single-page scrape, multi-URL batch jobs, site discovery, and search enrichment, returning cleaned, structured content suitable for downstream LLM reasoning. The server is designed to run with Firecrawl’s hosted API or self-hosted deployments, making it flexible for enterprise data-governance requirements. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Browserbase Skills

    Browserbase Skills

    Claude Agent SDK with a web browsing tool

    ...The design emphasizes reliability and repeatability, reducing the complexity of handling dynamic web interfaces. It is particularly useful for building AI agents that perform tasks like scraping, testing, or workflow automation. Overall, it turns browser interaction into a modular and programmable skill system.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Obscura

    Obscura

    The headless browser for AI agents and web scraping

    Obscura is a security-focused project aimed at providing tools and techniques for enhancing privacy, anonymity, and operational security in digital environments. It is designed for users who need to obscure their digital footprint and reduce traceability across systems. The project typically includes utilities for masking identity, managing secure communication, and mitigating surveillance risks. It emphasizes practical implementations of privacy-preserving workflows rather than purely...
    Downloads: 56 This Week
    Last Update:
    See Project
  • 8
    Fli

    Fli

    Google Flights MCP and Python Library

    Fli is a powerful Python library and command-line tool that provides direct programmatic access to Google Flights data through reverse-engineered API interactions rather than traditional web scraping. This approach enables faster, more reliable, and more stable access to flight information, avoiding the fragility associated with HTML parsing and UI changes. The library supports a wide range of flight search capabilities, including filtering by airline, departure time, number of stops, cabin class, and sorting by price or duration, making it suitable for both casual queries and advanced travel analysis. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Deep Research Web UI

    Deep Research Web UI

    AI-powered research assistant that performs iterative, deep research

    Deep Research Web UI is an AI-powered research assistant interface designed to automate complex, multi-step information gathering workflows through a combination of search engines, web scraping, and large language models. It operates as a front-end system for deep research agents that iteratively refine queries, retrieve information from multiple sources, and synthesize structured outputs into coherent reports. The platform emphasizes long-horizon reasoning, allowing users to explore topics in depth rather than receiving shallow, single-response answers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Open Deep Research

    Open Deep Research

    An AI-powered research assistant that performs iterative research

    Deep Research is a lightweight AI research agent designed to autonomously investigate complex topics through iterative web exploration and reasoning. The project combines search engines, web scraping, and large language models to progressively refine its understanding of a user’s query and dive deeper over multiple cycles. Its core goal is to provide the simplest possible implementation of a deep research workflow so developers can study and extend agent behavior without dealing with large, opaque codebases. The system exposes parameters such as breadth and depth to control how widely and how deeply the agent explores information sources. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    webclaw

    webclaw

    Fast, local-first web content extraction for LLMs

    webclaw is a high-performance web content extraction tool designed specifically for AI agents and large language models, focusing on delivering clean, structured data instead of raw HTML. It is built in Rust and operates without a headless browser, using advanced techniques such as TLS fingerprinting to bypass common scraping barriers and mimic real browser behavior. The tool addresses a major inefficiency in AI workflows by removing irrelevant elements like navigation menus, ads, and scripts, significantly reducing token usage when feeding data into language models. It supports multiple modes of operation, including CLI usage, REST API access, and an MCP server for direct integration with agent-based systems. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Browserbase MCP Server

    Browserbase MCP Server

    Allow LLMs to control a browser with Browserbase and Stagehand

    ...It leverages Browserbase infrastructure along with Stagehand to deliver high-performance browser automation with improved speed and efficiency through caching and optimized execution pipelines. The system supports multiple AI models and integrates seamlessly into agent workflows, making it suitable for applications such as web scraping, testing, and intelligent automation. It also includes advanced capabilities such as screenshot capture, DOM analysis, and session persistence, enabling complex interactions across multiple browsing sessions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Amazing-Python-Scripts

    Amazing-Python-Scripts

    Curated collection of Amazing Python scripts

    ...The repository encourages community contributions, allowing developers to add their own scripts and improve existing ones through pull requests. Examples include scripts for sentiment analysis, data scraping, web automation, log analysis, and interactive applications such as games or voice-controlled tools. The project also provides contribution guidelines and documentation so that developers can easily collaborate and expand the collection of scripts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    AI-Crawler

    AI-Crawler

    Crawl a website starting from a URL, find relevant pages

    ...The tool supports output formats such as JSON and Markdown, and it can generate or accept schemas to ensure that extracted data is structured according to application needs. It is designed as a low-code solution, reducing the complexity of building and maintaining custom scraping pipelines.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    Lightpanda Browser

    Lightpanda Browser

    Lightpanda: the headless browser designed for AI and automation

    Lightpanda is an open-source headless browser designed specifically for automation, artificial intelligence workflows, and large-scale web interaction tasks. Unlike traditional browsers that include full graphical rendering engines meant for human users, Lightpanda is built from scratch to operate entirely in headless mode, focusing only on the components required for programmatic web interaction. This design allows it to execute JavaScript and interact with web pages while avoiding the...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 16
    n8n-MCP

    n8n-MCP

    A MCP for Claude Desktop / Claude Code / Windsurf / Cursor

    n8n-mcp is a Model Context Protocol (MCP) server that turns the n8n workflow platform into a set of first-class, typed tools an AI assistant can understand and operate. It exposes structured knowledge of n8n nodes and operations so an agent can reason about workflows, parameters, and executions without scraping docs or guessing API shapes. The server focuses on making Claude Desktop (and other MCP-capable clients) “n8n-literate,” enabling tasks such as inspecting existing workflows, proposing node chains, and validating configuration before runs. It ships with organized resources and tool definitions that map cleanly to n8n’s ecosystem, improving reliability compared with ad-hoc prompt patterns. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 17
    Xianyu Intelligent Monitor Bot

    Xianyu Intelligent Monitor Bot

    AI tool for real-time monitoring and analysis of Goofish listings

    ai-goofish-monitor is an open source automation tool designed to monitor listings on the Goofish second-hand marketplace and analyze them using artificial intelligence. It combines browser automation with AI-based analysis to automatically search, collect, and evaluate newly posted items that match a user’s purchase criteria. It uses Playwright to simulate real user interactions with the marketplace, allowing the system to retrieve product data and track updates in near real time....
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    Browser Agent

    Browser Agent

    AI Browser Agent is an advanced Browser AI tool

    ...It also provides structured output formats such as JSON, HTML, Markdown, or screenshots, making it easy to integrate results into other systems or pipelines. Because it can interact with dynamic, JavaScript-heavy websites, it is suitable for modern web scraping and automation tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    yt-fts

    yt-fts

    Search all of YouTube from the command line

    yt-fts, short for YouTube Full Text Search, is an open-source command-line tool that enables users to search the spoken content of YouTube videos by indexing their subtitles. The program automatically downloads subtitles from a specified YouTube channel using the yt-dlp utility and stores them in a local SQLite database. Once indexed, users can perform full-text searches across all transcripts to quickly locate keywords or phrases mentioned within the videos. The tool returns search results...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    The Web MCP

    The Web MCP

    A powerful Model Context Protocol (MCP) server

    Bright Data’s Web MCP server gives AI assistants robust, real-time web capabilities through an MCP interface designed to avoid blocks, rate limits, and CAPTCHAs. It presents search, crawl, navigate, and extraction tools that agents can call directly, replacing brittle scraping prompts with typed operations. The README markets it as a “gateway” to the live web so assistants don’t fall back to stale training data. Bright Data also advertises a getting-started tier with a free monthly allotment, plus options for remote or self-hosted operation depending on governance needs. Ecosystem materials and examples show how it plugs into MCP-capable runtimes and agent frameworks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    chrome-cdp

    chrome-cdp

    Give your AI agent access to your live Chrome session

    ...Its architecture likely abstracts CDP commands into higher-level operations that are easier for agents to use. This makes it particularly useful for automation tasks such as web scraping, testing, and dynamic data retrieval. The system emphasizes precision and control, allowing agents to operate browsers programmatically with fine-grained actions. Overall, chrome-cdp-skill acts as a bridge between AI reasoning and real-world web interaction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and improving markdown conversion, reflecting active community use in research flows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ChatGPT Proxy

    ChatGPT Proxy

    Simple Cloudflare bypass for ChatGPT

    ...This tool works by accepting requests in a defined format, forwarding them through the proxy to ChatGPT’s backend services, and returning responses to the caller, abstracting away direct browser automation or scraping concerns from the application layer. By consolidating the traffic through a proxy, developers can centralize logging, throttling, authentication, and caching in one place, making it easier to build consistent and controlled AI workflows. The proxy can also be customized to enforce usage policies, attach additional metadata, or translate request/response formats for compatibility with other tools.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    Pattern

    Pattern

    Web mining module for Python, with tools for scraping

    ...The project integrates multiple capabilities into a single framework that allows developers to collect, process, and analyze textual data from the web. It includes modules for web scraping and crawling that can retrieve information from sources such as social media platforms, search engines, and online knowledge bases. In addition to data mining features, the library offers natural language processing functionality including part-of-speech tagging, sentiment analysis, and n-gram extraction. The framework also includes machine learning algorithms that support classification, clustering, and vector space modeling for text analysis tasks. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Huginn

    Huginn

    Create agents that monitor and act on your behalf

    Huginn is an open-source system for building agents that perform automated tasks by monitoring websites, APIs, emails, and more. Inspired by IFTTT, Huginn lets users create complex workflows and conditional logic to react to events and manage data. It’s self-hosted, highly customizable, and suitable for developers who want full control over automation without relying on third-party platforms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB