Showing 47 open source projects for "extraction"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Firecrawl

    Firecrawl

    Turn entire websites into LLM-ready markdown or structured data

    Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling, and data extraction capabilities. Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap is required.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 2
    DocStrange

    DocStrange

    Extract and convert data from any document, images, pdfs, word doc

    DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    OCRBase

    OCRBase

    MD/.JSON Document OCR and structured data extraction API

    OCRBase is a self-hostable document OCR and structured extraction system built to turn PDFs into machine-usable outputs at scale, aiming to bridge the gap between raw text extraction and production-ready pipelines. Instead of treating OCR as a one-off script, it presents an API-driven workflow where documents are submitted as jobs and processed through a queue-based architecture that can handle high throughput.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 5
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    JavaScript Obfuscator

    JavaScript Obfuscator

    A powerful obfuscator for JavaScript and Node.js

    JavaScript Obfuscator is a Node.js library and CLI that transforms readable JavaScript into hardened, difficult-to-reverse code. It applies techniques such as identifier mangling, string array extraction/encoding, control-flow flattening, dead-code injection, and numeric literal transformations to disguise intent. Advanced options include self-defending code, domain locking, debug/console protection, and property key transformation, allowing you to tailor defenses to your threat model. The tool supports source maps and granular “threshold”/whitelist settings so you can balance protection with performance and debuggability. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 7
    LiteParse

    LiteParse

    A fast, helpful, and open-source document parser

    ...It also includes mechanisms for validation and error handling, ensuring that outputs conform to expected schemas and reducing the need for manual postprocessing. The library is particularly useful for tasks such as data extraction, document processing, and building pipelines that require structured outputs from natural language input.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    Tamagui

    Tamagui

    Style React fast with 100% parity on React Native

    ...Tamagui also includes a full UI kit with both styled and unstyled components, enabling flexible design system creation. Its compiler performs advanced optimizations such as CSS extraction, tree flattening, and dead code elimination, reducing bundle size and improving rendering speed. The system includes robust theming capabilities with support for design tokens, responsive props, and dynamic themes like dark mode.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 9
    Magnitude

    Magnitude

    Vision AI browser agent for automation, testing, and extraction

    ...This approach allows the agent to generalize better across complex and modern websites, making it more robust than traditional selector-based automation tools. Browser Agent by Magnitude supports a wide range of capabilities including navigation, interaction, data extraction, and automated verification through built-in testing features. Developers can use it to automate repetitive web tasks, integrate services without APIs, or build advanced browser-based agents. It also provides flexible abstraction levels, allowing both high-level task execution and precise low-level control of actions like mouse movements and keyboard input.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    HeadlessX

    HeadlessX

    The undetected self-hosted browser automation platform

    ...One of the platform’s goals is to bypass common bot-detection systems by implementing advanced fingerprint spoofing and stealth techniques. The tool can perform tasks such as HTML extraction, screenshot generation, content parsing, and search result scraping while appearing like a normal user browser. Because it is self-hosted, organizations can run the platform on their own infrastructure to maintain privacy and control over automation workflows.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    Search1API MCP

    Search1API MCP

    A Model Context Protocol (MCP) server

    The Search1API MCP Server is a Model Context Protocol server that provides search and crawl functionality using Search1API. It enables web and news searches, content extraction, and sitemap retrieval, integrating seamlessly with MCP clients. ​
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    imagefap-dl

    imagefap-dl

    ImageFap gallery downloader

    imagefap-dl is a command-line downloader designed to automate the retrieval of galleries and media from ImageFap, focusing on efficiency, reliability, and structured output. The tool enables users to download entire galleries or specific content collections by parsing URLs and systematically fetching associated media files. It is optimized for batch downloading scenarios, allowing users to archive large sets of images with minimal manual intervention. The program typically includes...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 13
    KaraKeep

    KaraKeep

    A self-hostable bookmark-everything app

    ...Automatic fetching of link titles, descriptions, and images streamlines saving content without manual edits, while rule-based management lets users define customized workflows. With support for image OCR and structured data extraction, Karakeep functions as a flexible personal knowledge base for researchers, content creators, and heavy bookmarkers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    nhentai

    nhentai

    A library for interacting with the nhentai API

    nhentai is a JavaScript and TypeScript library designed to interact with the nhentai API and retrieve doujinshi metadata and content information. It enables developers to programmatically access galleries, titles, tags, covers, and page URLs from the nhentai platform. The library supports both CommonJS and ES6 module imports, making it easy to integrate into different Node.js projects. Developers can use it to fetch specific doujin entries, explore associated metadata, and process gallery...
    Downloads: 58 This Week
    Last Update:
    See Project
  • 15
    Documind

    Documind

    Open-source platform for extracting structured data from documents

    Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    Actors MCP Server

    Actors MCP Server

    Model Context Protocol (MCP) Server for Apify's Actors

    The Apify Actors MCP Server is a Model Context Protocol (MCP) server that enables AI assistants to interact with Apify Actors. This integration allows AI models to utilize various web scraping and automation tools provided by Apify, facilitating tasks such as data extraction and web automation. ​
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    Compiled

    Compiled

    A familiar and performant compile time CSS-in-JS library for React

    ...Using APIs and behavior you may already be familiar with, write your styles in JavaScript with the full power of CSS, leveraging the language to create expressive & dynamic experiences. Build with your bundler of choice or just Babel, resulting in very performant components that have their styles built ahead of time. Turn on extraction and all components styled in your app and sourced through NPM will have their runtime stripped and styles extracted to an atomic style sheet. Compiled brings distributed styles from the platform, product, and the wider ecosystem together. Add the loader to your Webpack config. Make sure this is defined after other loaders so it runs first. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Mobile Next

    Mobile Next

    Model Context Protocol Server for Mobile Automation and Scraping

    ...It abstracts away platform-specific complexities, allowing developers and AI agents to interact with mobile devices using a consistent set of commands regardless of operating system. The system supports real devices, emulators, and simulators, making it suitable for testing, automation, and data extraction workflows in diverse development setups. One of its key innovations is its hybrid interaction model, which combines structured accessibility data with fallback screenshot-based analysis to ensure reliable automation even in complex UI scenarios. It is built to integrate seamlessly with modern AI agents, enabling multi-step workflows such as automated testing, form filling, and user journey simulation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    GenAIScript

    GenAIScript

    Automatable GenAI Scripting

    JavaScript-ish environment with convenient tooling for file ingestion, prompt development, and structured data extraction. A Microsoft tool that generates AI-powered text based on prompts, useful for content creation and automation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    TTime

    TTime

    Screenshots, word marking, OCR, AI, translation software

    TTime is a desktop productivity tool that combines translation, OCR, and screen capture capabilities into a unified application designed for fast and efficient text processing workflows. It allows users to translate text through multiple methods, including direct input, screenshot-based capture, and real-time word selection, making it versatile for both casual use and professional tasks. The software integrates a wide range of translation engines and OCR services, including cloud-based...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    Markdownify MCP Server

    Markdownify MCP Server

    Convert files and web content into clean, usable Markdown easily

    ...It supports formats such as PDFs, images, audio with transcription, DOCX, XLSX, and PPTX, along with web sources like YouTube transcripts, Bing results, and general webpages. Markdownify MCP is designed to simplify content extraction and make data easier to read, share, and reuse in structured workflows. Developers can install dependencies, build, and run the server locally, then extend functionality by modifying its TypeScript-based tools and server logic. It also allows retrieval of existing Markdown files, making it useful for documentation, research, and AI-assisted workflows. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 22
    Read Frog

    Read Frog

    Open Source Immersive Translate

    Read Frog is an open-source browser extension designed to transform everyday web reading into an immersive language learning experience powered by artificial intelligence. The tool integrates translation, contextual explanations, and content analysis directly into the browsing workflow so users can learn languages naturally while reading authentic online content. Instead of forcing learners to switch between translation tools and the original text, the extension displays translations...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 23
    AI Website Cloner Template

    AI Website Cloner Template

    Clone any website with one command using AI coding agents

    AI Website Cloner Template is a reusable project template that enables AI coding agents to reverse-engineer and recreate existing websites as modern, production-ready applications. It automates the process of analyzing a target website, extracting its design system, and rebuilding it using technologies such as Next.js, React, and Tailwind CSS. The system operates through a multi-stage pipeline that includes reconnaissance, component specification, parallel generation, and final assembly,...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 24
    Kuma UI

    Kuma UI

    A Headless, Utility-First, and Zero-Runtime UI Component Library

    Kuma UI is an open-source styling and component library that focuses on providing a headless, utility-first approach to building modern web interfaces. The framework emphasizes performance by extracting CSS at build time, allowing developers to create fast websites without requiring runtime styling engines in the browser. By combining utility-first styling with headless component patterns, Kuma UI allows developers to fully customize visual appearance while relying on reusable component...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 25
    LibPDF

    LibPDF

    A modern PDF library for TypeScript

    ...The library offers full read and write manipulation, including support for encryption with RC4 and modern AES cipher suites, form filling and flattening, digital signature creation and verification, page merging/splitting, rich text extraction with layout information, and font embedding with subsetting.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB