Showing 826 open source projects for "extraction"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Wiseflow

    Wiseflow

    Enhance any agent's browser use skill

    Wiseflow is an open-source information extraction and knowledge discovery system designed to collect, filter, and organize valuable information from large volumes of online content. The platform continuously monitors specified sources such as websites, social platforms, and other digital channels to identify relevant data according to user-defined interests or topics. By combining web crawling, content parsing, and large language model analysis, the system extracts concise insights from raw information streams and converts them into structured data that can be stored or analyzed. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 3
    MiroFish

    MiroFish

    A Simple and Universal Swarm Intelligence Engine

    MiroFish is a next-generation artificial intelligence prediction engine that leverages multi-agent technology and swarm-intelligence simulation to model, simulate, and forecast complex real-world scenarios. The system extracts “seed” information from sources such as breaking news, policy documents, and market signals to construct a high-fidelity digital parallel world populated by thousands of virtual agents with independent memory and behavior rules. Users can inject variables or conditions...
    Downloads: 569 This Week
    Last Update:
    See Project
  • 4
    fireworks-tech-graph

    fireworks-tech-graph

    Claude Code skill for generating production-quality SVG+PNG technical

    ...It aims to transform unstructured information into interconnected graphs that can be queried and analyzed for insights, making it easier to understand complex ecosystems such as software stacks or research fields. The system likely leverages AI techniques for entity extraction, relationship mapping, and graph construction, enabling automated knowledge organization. It can be used to power recommendation systems, research tools, or intelligent assistants that require contextual understanding of technical topics. The project emphasizes scalability and adaptability, allowing it to handle large datasets and evolving knowledge bases. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    video2robot

    video2robot

    End-to-end pipeline converting generative videos

    video2robot is an end-to-end open-source pipeline that converts generative video or prompt-driven motion content into executable humanoid robot motion sequences, enabling researchers and developers to go from high-level action descriptions or videos to robot-ready motion data. The pipeline supports both prompt-to-video generation using models like Veo/Sora and video upload processing, followed by human pose extraction through a 3D pose model and retargeting of that motion to robot joints using a general motion retargeting system. This workflow allows users to generate robot motion files that specify joint angles, root positions, and orientations that can be deployed on supported robot platforms (e.g., Unitree models). Video2robot includes scripts for each stage of the pipeline (generation, extraction, conversion, visualization) and can run as a CLI or through a basic web UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    AgentQL MCP

    AgentQL MCP

    Model Context Protocol server that integrates AgentQL's data

    The AgentQL MCP Server is a Model Context Protocol (MCP) server that integrates AgentQL's data extraction capabilities, enabling users to extract structured data from web pages using natural language prompts. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    LiteParse

    LiteParse

    A fast, helpful, and open-source document parser

    ...It also includes mechanisms for validation and error handling, ensuring that outputs conform to expected schemas and reducing the need for manual postprocessing. The library is particularly useful for tasks such as data extraction, document processing, and building pipelines that require structured outputs from natural language input.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    warp

    warp

    A super-easy, composable, web server framework for warp speeds

    The fundamental building block of warp is the Filter, they can be combined and composed to express rich requirements on requests. A Filter in warp is essentially a function that can operate on some input, either something from a request, or something from a previous Filter, and returns some output, which could be some app-specific type you wish to pass around, or can be some reply to send back as an HTTP response. That might sound simple, but the exciting part is the combinators that exist...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    MatImage

    MatImage

    Image Processing library for Matlab

    matImage is an open-source MATLAB library for image processing and analysis. It provides a variety of tools for image enhancement, segmentation, and feature extraction. It’s especially useful for users working on biomedical images or those needing detailed image analysis in MATLAB.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    tsfresh

    tsfresh

    Automatic extraction of relevant features from time series

    tsfresh is a python package. It automatically calculates a large number of time series characteristics, the so called features. tsfresh is used to to extract characteristics from time series. Without tsfresh, you would have to calculate all characteristics by hand. With tsfresh this process is automated and all your features can be calculated automatically. Further tsfresh is compatible with pythons pandas and scikit-learn APIs, two important packages for Data Science endeavours in python....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines that chain together transforms, filters, and exporters, enabling automation of tedious data preparation steps and accelerating insights with minimal code. The system places a premium on extensibility, allowing contributors to add new extractors or analysis modules tailored to specific industries or datasets. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    yt-dlp-gui

    yt-dlp-gui

    A cross-platform GUI wrapper for yt-dlp written in PySide6

    ...Written in PySide6 (Python with Qt bindings), it wraps the powerful yt-dlp engine in a visual application that lets users paste video URLs, choose formats, apply presets, and start downloads with a click, while still exposing options for advanced tweaks via configuration files. The project supports preset definitions and global arguments through a config file, so users can customize their most common download workflows—like audio extraction, quality ranking, or embedding thumbnails—without retyping arguments each time. Downloads can be initiated from a portable app bundle or run manually with Python, making it flexible across platforms including Windows and Linux.
    Downloads: 274 This Week
    Last Update:
    See Project
  • 13
    Auto-Deep-Research

    Auto-Deep-Research

    Your Fully-Automated Personal AI Assistant

    ...Users provide a research topic or multifaceted goal, and the system autonomously breaks the objective down into subtasks like literature collection, critical summarization, cross-comparison, citation extraction, metric evaluation, and structured writing. Auto-Deep-Research integrates retrieval from academic and web sources, processes document corpora for relevance and key insights, and organizes outputs into coherent chapters or sections according to research standards. It also embeds validation loops, where intermediate drafts are self-checked for consistency, coverage, and alignment with sound reasoning practices, reducing reliance on raw generation alone.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Symfony DomCrawler

    Symfony DomCrawler

    Eases DOM navigation for HTML and XML documents

    Symfony DomCrawler is a PHP component that provides powerful tools for navigating and extracting data from HTML and XML documents. It allows developers to parse, filter, and manipulate web pages using CSS selectors and XPath expressions. DomCrawler is widely used for web scraping, testing, and processing structured content, and integrates well with other Symfony components like BrowserKit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Magnitude

    Magnitude

    Vision AI browser agent for automation, testing, and extraction

    ...This approach allows the agent to generalize better across complex and modern websites, making it more robust than traditional selector-based automation tools. Browser Agent by Magnitude supports a wide range of capabilities including navigation, interaction, data extraction, and automated verification through built-in testing features. Developers can use it to automate repetitive web tasks, integrate services without APIs, or build advanced browser-based agents. It also provides flexible abstraction levels, allowing both high-level task execution and precise low-level control of actions like mouse movements and keyboard input.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    JavaScript Obfuscator

    JavaScript Obfuscator

    A powerful obfuscator for JavaScript and Node.js

    JavaScript Obfuscator is a Node.js library and CLI that transforms readable JavaScript into hardened, difficult-to-reverse code. It applies techniques such as identifier mangling, string array extraction/encoding, control-flow flattening, dead-code injection, and numeric literal transformations to disguise intent. Advanced options include self-defending code, domain locking, debug/console protection, and property key transformation, allowing you to tailor defenses to your threat model. The tool supports source maps and granular “threshold”/whitelist settings so you can balance protection with performance and debuggability. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Hacks

    Hacks

    A collection of hacks and one-off scripts

    ...Rather than being a single cohesive application, it serves as a repository of practical command-line tools that can be used independently or combined into workflows. The scripts cover a wide range of tasks, including URL manipulation, parameter replacement, data extraction, and reconnaissance automation. Many of the tools in the repository are designed for efficiency and simplicity, enabling users to perform complex operations with minimal overhead. It is particularly popular among security researchers and developers who need quick, flexible solutions for niche problems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    MemProcFS Analyzer

    MemProcFS Analyzer

    Automated Forensic Analysis of Windows Memory Dumps for DFIR

    ...By exposing process memory, kernel objects, and derived artifacts as regular files, the framework lets analysts use familiar filesystem operations and standard tools (editors, grep, diff) to explore memory snapshots. The Analyzer layer adds higher-level parsing and extraction routines—for example, carving strings, locating injected modules, enumerating handles, or reconstructing network sockets—so investigators can go from raw memory to actionable evidence more quickly. It emphasizes automation and reproducibility: parsers can be chained, results exported, and reports templated to fit incident workflows. ...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 20
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 22
    Bespoke Curator

    Bespoke Curator

    Synthetic data curation for post-training and data extraction

    ...It supports workflows where models are used to produce synthetic examples that can later be refined into reliable training datasets for reasoning, question answering, or structured information extraction tasks. Curator includes tools for monitoring data generation processes and managing dataset quality while large batches of examples are being created. The framework also integrates with multiple inference systems and APIs, allowing users to generate data using different model providers or open-source inference engines.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    CL4R1T4S

    CL4R1T4S

    Archive of leaked AI system prompts and internal instruction sets

    ...According to the its description, the initiative is motivated by the idea that understanding the input instructions behind an AI system helps users better interpret its outputs. Contributors are encouraged to add newly discovered or extracted prompts, along with information such as the model version, extraction date, and etc.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    JS Analyzer

    JS Analyzer

    Burp Suite extension for JavaScript static analysis

    JS Analyzer is a powerful static analysis tool implemented as a Burp Suite extension that helps security researchers and web developers automatically uncover important artifacts in JavaScript files during web application testing. It parses JavaScript responses intercepted by Burp Suite and intelligently extracts API endpoints, full URLs (including cloud storage links), secrets like API keys or tokens, and email addresses while filtering out noise from irrelevant code patterns. The extension...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    imagefap-dl

    imagefap-dl

    ImageFap gallery downloader

    imagefap-dl is a command-line downloader designed to automate the retrieval of galleries and media from ImageFap, focusing on efficiency, reliability, and structured output. The tool enables users to download entire galleries or specific content collections by parsing URLs and systematically fetching associated media files. It is optimized for batch downloading scenarios, allowing users to archive large sets of images with minimal manual intervention. The program typically includes...
    Downloads: 29 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB