Showing 49 open source projects for "extraction"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Markdownify MCP Server

    Markdownify MCP Server

    Convert files and web content into clean, usable Markdown easily

    ...It supports formats such as PDFs, images, audio with transcription, DOCX, XLSX, and PPTX, along with web sources like YouTube transcripts, Bing results, and general webpages. Markdownify MCP is designed to simplify content extraction and make data easier to read, share, and reuse in structured workflows. Developers can install dependencies, build, and run the server locally, then extend functionality by modifying its TypeScript-based tools and server logic. It also allows retrieval of existing Markdown files, making it useful for documentation, research, and AI-assisted workflows. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    GPT Crawler

    GPT Crawler

    Crawl a site to generate knowledge files to create your own custom GPT

    GPT Crawler is an open-source tool designed to automatically crawl websites and generate structured knowledge that can be used to build AI assistants and retrieval systems. It focuses on extracting high-quality textual content from web pages and preparing it in formats suitable for embedding, indexing, or fine-tuning workflows. The project is especially useful for teams that want to turn documentation sites or knowledge bases into conversational AI backends without building custom scrapers...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Read Frog

    Read Frog

    Open Source Immersive Translate

    Read Frog is an open-source browser extension designed to transform everyday web reading into an immersive language learning experience powered by artificial intelligence. The tool integrates translation, contextual explanations, and content analysis directly into the browsing workflow so users can learn languages naturally while reading authentic online content. Instead of forcing learners to switch between translation tools and the original text, the extension displays translations...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Eko

    Eko

    Build Production-ready Agentic Workflow with Natural Language

    Eko (Eko Keeps Operating) is a JavaScript framework designed for building production-ready agent-based workflows using natural language commands. It allows developers to create automated agents that can handle complex workflows in both computer and browser environments. With a focus on high development efficiency, Eko simplifies the creation of multi-step workflows, enabling users to integrate and automate tasks across platforms. It provides a unified interface for managing agents, offering...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Passmark

    Passmark

    The open-source Playwright library for AI browser regression testing

    The Passmark project is an open-source AI-powered regression testing framework built on top of Playwright that enables developers to write end-to-end browser tests using natural language instead of traditional scripting. It is designed to simplify and accelerate testing workflows by allowing AI models to interpret human-readable instructions and translate them into executable browser actions. One of its defining features is a cache-first execution model, where AI is used initially to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    nw_wrld

    nw_wrld

    nw_wrld is an event-driven sequencer for triggering visuals

    nw_wrld is a procedurally generated world-building engine tailored for game developers and interactive storytellers who want to craft rich, random yet coherent environments without hand-crafting every detail. It uses noise functions and modular terrain algorithms to generate expansive maps, diverse biomes, and layered features like rivers, mountain ranges, forests, and resource nodes. The system is designed to be extensible, letting developers plug in new generation rules or tweak parameters...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    HyperAgent

    HyperAgent

    AI Browser Automation

    ...This approach reduces the brittleness commonly associated with traditional automation scripts that break when the DOM structure changes. HyperAgent includes APIs such as page.ai() and page.extract() that allow structured data extraction and dynamic task execution through AI reasoning.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    GetProfile

    GetProfile

    User profile and long-term memory for your AI agent

    GetProfile is a drop-in proxy layer that sits in front of your LLM provider to turn otherwise stateless chat requests into a system with persistent user profiles and long-term memory. Instead of forcing you to redesign your application, you route your model calls through GetProfile and it captures conversation context automatically as traffic flows. It then extracts structured traits and “memories” from those conversations, stores them, and injects the most relevant profile context back into...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Gooo

    Gooo

    Toolkit for developing web applications in Vue, Templ, and Go

    ...The project emphasizes simplicity and flexibility, enabling users to integrate its components into scripts or larger systems. While not as feature-heavy as enterprise frameworks, it serves as a foundation for experimentation and rapid prototyping in data extraction or automation tasks. Its design reflects a developer-centric approach, prioritizing extensibility and ease of modification over polished interfaces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    BrowserNode

    BrowserNode

    Make websites accessible for AI agents. Automate tasks online

    Browsernode is an open-source TypeScript framework that allows AI agents to interact directly with web browsers in order to automate tasks and gather information from websites. The project acts as a bridge between AI models and browser automation tools, enabling language models to control web pages programmatically. Built as an implementation compatible with the Browser-use ecosystem, Browsernode allows agents to perform actions such as navigating pages, extracting information, filling...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    agentation

    agentation

    The visual feedback tool for agents

    Agentation is a visual annotation and feedback tool designed to make interacting with AI coding agents more intuitive and precise by letting developers visually click on frontend elements in a browser and annotate them with context before sending structured feedback to an agent. Instead of describing UI elements in text — like “the blue button in the sidebar” — users click directly on elements to automatically capture selectors, positions, and contextual metadata that can be consumed by AI...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Browserbase MCP Server

    Browserbase MCP Server

    Allow LLMs to control a browser with Browserbase and Stagehand

    Browserbase MCP Server is a server implementation of the Model Context Protocol (MCP) that enables large language models to interact with web browsers programmatically through cloud-based automation. The project provides a standardized interface for connecting AI systems to real-world web environments, allowing them to navigate pages, extract structured data, and perform user-like actions such as clicking, typing, and form submission. It leverages Browserbase infrastructure along with...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Kuma UI

    Kuma UI

    A Headless, Utility-First, and Zero-Runtime UI Component Library

    Kuma UI is an open-source styling and component library that focuses on providing a headless, utility-first approach to building modern web interfaces. The framework emphasizes performance by extracting CSS at build time, allowing developers to create fast websites without requiring runtime styling engines in the browser. By combining utility-first styling with headless component patterns, Kuma UI allows developers to fully customize visual appearance while relying on reusable component...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Figma Code Connect

    Figma Code Connect

    A tool for connecting your design system components

    Figma Code Connect is an open-source tool that enhances collaboration between designers and developers by synchronizing design components with source code in real time. Instead of treating design files and codebases as separate artifacts, it creates a continuous link so when a designer updates a UI element in Figma, developers see corresponding code changes or annotations immediately, making handoffs more precise and frictionless. The system supports multiple frameworks and languages,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    react-docgen

    react-docgen

    A CLI and toolbox to extract information from React component files

    react-docgen is a CLI and toolbox to help extracting information from React components, and generate documentation from it. It uses @babel/parser to parse the source into an AST and provides methods to process this AST to extract the desired information. The output / return value is a JSON blob / JavaScript object. It provides a default implementation for React components defined via React.createClass, ES2015 class definitions or functions (stateless components). These component definitions...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    LLM Scraper

    LLM Scraper

    Extract structured data from webpages using LLM-powered scraping

    LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects. LLM Scraper integrates...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    AeroFTP

    AeroFTP

    AeroFTP is a Cross-platform desktop client for FTP, SFTP, WebDAV, S3

    AeroFTP is a cross-platform file transfer client that goes beyond traditional FTP. Connect to 25+ protocols, FTP/FTPS, SFTP, WebDAV, S3, Google Drive, Dropbox, OneDrive, MEGA, Box, pCloud, Azure, Filen, and more from a single interface. Security-first: AeroVault v2 encrypted containers (AES-256-GCM-SIV), Cryptomator support, and zero telemetry. Built-in AeroAgent AI assistant with 19 providers and 47 tools for file operations and workflow automation. Includes Monaco editor,...
    Downloads: 421 This Week
    Last Update:
    See Project
  • 18
    Ayakashi

    Ayakashi

    The next generation web scraping framework

    ...Directly inspired by the relational database world (and SQL), domQL makes DOM access easy and readable no matter how obscure the page's structure is. Props are the way to package domQL expressions as re-usable structures which can then be passed around to actions or to be used as models for data extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Botpress

    Botpress

    Dev tools to reliably understand text and automate conversations

    ...We propose you a complete dev-friendly platform that ships with all the tools you need to build, deploy and manage production-grade chatbots in record time. Built-in Natural Language Processing tasks such as intent recognition, spell checking, entity extraction, and slot tagging (and many others). A visual conversation studio to design multi-turn conversations and workflows. An emulator & a debugger to simulate conversations and debug your chatbot. Support for popular messaging channels like Slack, Telegram, MS Teams, Facebook Messenger, and an embeddable web chat. An SDK and code editor to extend the capabilities. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    NuxtJS

    NuxtJS

    The Intuitive Web Framework, based on Vue 3

    ...Utility, ease of use, and efficiency are key. Nuxt is built with a set of features that make this possible. Optimized with code-splitting, tree-shaking, optimized cold-start, link prefetching, payload extraction, just to name a few. Fast by default so you can focus on building. Decide what rendering strategy at the route level: SSR, SSG, CSR, ISR, ESR, SWR. Build any kind of website or web application with optimized performance in mind. By leveraging server-side rendering, ESM format and optimized images, Nuxt websites are indexable by search engines while giving the feeling of an app to the end-users.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Scylla

    Scylla

    Intelligent proxy pool for collecting and managing public proxies

    Scylla is an open source proxy pool system designed to collect, validate, and manage large numbers of public proxy servers for use in web scraping and data extraction workflows. It automatically crawls the internet to discover proxy IP addresses and evaluates their availability and reliability before adding them to a usable pool. It includes a JSON API that allows developers and applications to retrieve proxy information programmatically, making it easier to integrate proxy rotation into scraping tools or automation scripts. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 22
    ttag

    ttag

    Simple approach for javascript localization

    Modern javascript i18n localization library based on ES6 tagged templates and the good old GNU gettext. Just tag your strings to make them translatable. Use a simple ttag-cli tool for translation extraction. Can be easily integrated with almost any workflow as it uses the babel-plugin for string extraction. Can be easily used with the typescript. Allows you to place translations into the sources on a build step. Gettext is a simple localization format with a rich ecosystem. Ttag has support for plurals, contexts, translator comments, and much more.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
     Anyrow SDK

    Anyrow SDK

    TypeScript/JS SDK for Anyrow AI document extraction API

    Anyrow SDK (@anyrow/sdk) is the official TypeScript/JavaScript client for the Anyrow API — AI-native document extraction with built-in structured storage. Extract clean rows from PDFs, images, emails, calls, and websites. Manage tables, rows, batches, live extraction streams, and exports via a single typed client. Runs in browsers and on Node, Bun, Deno, and Cloudflare Workers — any runtime with `fetch`. Install: npm install @anyrow/sdk Quick start: import { AnyrowSDK } from "@anyrow/sdk" const client = new AnyrowSDK({ baseURL: "https://api.anyrow.ai", headers: { Authorization: "Bearer KEY" }}) await client.extract.once({ params: { project_id: "proj_123" }, json: { url: "https://example.com/doc.pdf" }}) Open source (MIT). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    pulsorclip

    pulsorclip

    Download videos from almost any website

    ...Built around the yt-dlp and ffmpeg engines, it supports over a thousand websites, including major platforms like YouTube, TikTok, Instagram, and others, enabling media extraction in multiple formats. The application focuses on a controlled workflow instead of instant downloads. Users first provide a media URL, then select format, quality, and container before processing the file. It includes both a web interface built with Next.js and a Telegram bot that offers the same guided experience through chat. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB