Showing 8 open source projects for "extraction"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Firecrawl

    Firecrawl

    Turn entire websites into LLM-ready markdown or structured data

    Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling, and data extraction capabilities. Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap is required.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 2
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    HeadlessX

    HeadlessX

    The undetected self-hosted browser automation platform

    ...One of the platform’s goals is to bypass common bot-detection systems by implementing advanced fingerprint spoofing and stealth techniques. The tool can perform tasks such as HTML extraction, screenshot generation, content parsing, and search result scraping while appearing like a normal user browser. Because it is self-hosted, organizations can run the platform on their own infrastructure to maintain privacy and control over automation workflows.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    KaraKeep

    KaraKeep

    A self-hostable bookmark-everything app

    ...Automatic fetching of link titles, descriptions, and images streamlines saving content without manual edits, while rule-based management lets users define customized workflows. With support for image OCR and structured data extraction, Karakeep functions as a flexible personal knowledge base for researchers, content creators, and heavy bookmarkers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Kuma UI

    Kuma UI

    A Headless, Utility-First, and Zero-Runtime UI Component Library

    Kuma UI is an open-source styling and component library that focuses on providing a headless, utility-first approach to building modern web interfaces. The framework emphasizes performance by extracting CSS at build time, allowing developers to create fast websites without requiring runtime styling engines in the browser. By combining utility-first styling with headless component patterns, Kuma UI allows developers to fully customize visual appearance while relying on reusable component...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6
    AeroFTP

    AeroFTP

    AeroFTP is a Cross-platform desktop client for FTP, SFTP, WebDAV, S3

    AeroFTP is a cross-platform file transfer client that goes beyond traditional FTP. Connect to 25+ protocols, FTP/FTPS, SFTP, WebDAV, S3, Google Drive, Dropbox, OneDrive, MEGA, Box, pCloud, Azure, Filen, and more from a single interface. Security-first: AeroVault v2 encrypted containers (AES-256-GCM-SIV), Cryptomator support, and zero telemetry. Built-in AeroAgent AI assistant with 19 providers and 47 tools for file operations and workflow automation. Includes Monaco editor,...
    Downloads: 111 This Week
    Last Update:
    See Project
  • 7
    Ayakashi

    Ayakashi

    The next generation web scraping framework

    ...Directly inspired by the relational database world (and SQL), domQL makes DOM access easy and readable no matter how obscure the page's structure is. Props are the way to package domQL expressions as re-usable structures which can then be passed around to actions or to be used as models for data extraction.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Scylla

    Scylla

    Intelligent proxy pool for collecting and managing public proxies

    Scylla is an open source proxy pool system designed to collect, validate, and manage large numbers of public proxy servers for use in web scraping and data extraction workflows. It automatically crawls the internet to discover proxy IP addresses and evaluates their availability and reliability before adding them to a usable pool. It includes a JSON API that allows developers and applications to retrieve proxy information programmatically, making it easier to integrate proxy rotation into scraping tools or automation scripts. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB