Showing 774 open source projects for "extraction"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    MCiSEE

    MCiSEE

    All of Minecraft, EASILY get Minecraft resources

    MCiSEE is an open-source project designed to integrate Minecraft with computer vision and artificial intelligence experiments. The system focuses on capturing visual information from the game environment and exposing it to external programs for analysis or machine learning research. By converting gameplay data into visual or structured formats, MCiSEE enables researchers and developers to build AI agents capable of interacting with the Minecraft environment. The project can be used as a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    BrowserNode

    BrowserNode

    Make websites accessible for AI agents. Automate tasks online

    Browsernode is an open-source TypeScript framework that allows AI agents to interact directly with web browsers in order to automate tasks and gather information from websites. The project acts as a bridge between AI models and browser automation tools, enabling language models to control web pages programmatically. Built as an implementation compatible with the Browser-use ecosystem, Browsernode allows agents to perform actions such as navigating pages, extracting information, filling...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    browserable

    browserable

    Open source and self-hostable browser automation library for AI agents

    Browserable is an open-source browser automation framework designed specifically for AI agents that need to interact with web interfaces in a human-like way. The project provides tools that allow automated agents to navigate websites, click buttons, fill out forms, and extract information from pages without manual scripting of each step. Built primarily in JavaScript, the framework offers both a developer-friendly SDK and a REST API that allow integration with AI applications and automation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    agentation

    agentation

    The visual feedback tool for agents

    Agentation is a visual annotation and feedback tool designed to make interacting with AI coding agents more intuitive and precise by letting developers visually click on frontend elements in a browser and annotate them with context before sending structured feedback to an agent. Instead of describing UI elements in text — like “the blue button in the sidebar” — users click directly on elements to automatically capture selectors, positions, and contextual metadata that can be consumed by AI...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    Papers We Love

    Papers We Love

    Papers from the computer science community to read and discuss

    Papers We Love (PWL) is a global open source community dedicated to reading, discussing, and sharing influential computer science research papers. The repository serves as a curated directory of academic papers that have shaped the field of computing, providing a centralized location for documents that were previously scattered across various online sources. While licensing restrictions prevent hosting all papers directly, PWL offers links to their original sources and clearly marks hosted...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Scalatra

    Scalatra

    Tiny Scala high-performance, async web framework

    Scalatra is a lightweight, high-performance micro web framework written in Scala, inspired by the Ruby framework Sinatra. Its goal is to provide a minimal but expressive foundation for building web applications or REST APIs in Scala without the verbosity or steep learning curve of larger frameworks. It supports asynchronous request handling, routing, filters, content negotiation, and easy integration with templating, JSON libraries, and other web middleware. Being unopinionated, it lets...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    NeMo Curator

    NeMo Curator

    Scalable data pre processing and curation toolkit for LLMs

    NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    AudioMuse-AI

    AudioMuse-AI

    AudioMuse-AI is an Open Source Dockerized environment

    AudioMuse-AI is an open-source system designed to automatically generate playlists and analyze music libraries using artificial intelligence and audio signal processing techniques. The platform runs locally in a Dockerized environment and performs detailed sonic analysis on audio files to understand characteristics such as tempo, mood, and acoustic similarity. By analyzing the underlying audio content rather than relying on external metadata services, the system can organize large personal...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    OWASP Maryam

    OWASP Maryam

    Modular OSINT framework for automated open-source intelligence gatheri

    ...Maryam helps security researchers and analysts streamline routine data-gathering tasks that typically involve searching multiple sources such as Google, Bing, or other online platforms. Maryam organizes its functionality into several modules that focus on different aspects of intelligence gathering, including footprint analysis, OSINT data extraction, and general search operations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 10
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    ...It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. QueryList supports common data extraction scenarios such as retrieving lists of titles, links, images, and other page elements from structured or semi-structured content. It also includes a powerful HTTP request system that enables complex operations such as simulated logins, proxy usage, and customized request headers. QueryList is designed with a modular architecture that allows developers to extend its capabilities through plugins for tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Reader LLM

    Reader LLM

    Convert any URL to an LLM-friendly input with a simple prefix

    Reader LLM is an open-source tool designed to convert web content into formats that are easier for large language models to process. The system works by transforming a webpage into a clean text or Markdown representation that removes unnecessary formatting and highlights the core information within the page. Developers can use a simple URL prefix to retrieve a version of a webpage that has been optimized for machine consumption, making it suitable for use in AI agents or retrieval-augmented...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Figma Code Connect

    Figma Code Connect

    A tool for connecting your design system components

    Figma Code Connect is an open-source tool that enhances collaboration between designers and developers by synchronizing design components with source code in real time. Instead of treating design files and codebases as separate artifacts, it creates a continuous link so when a designer updates a UI element in Figma, developers see corresponding code changes or annotations immediately, making handoffs more precise and frictionless. The system supports multiple frameworks and languages,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    react-docgen

    react-docgen

    A CLI and toolbox to extract information from React component files

    react-docgen is a CLI and toolbox to help extracting information from React components, and generate documentation from it. It uses @babel/parser to parse the source into an AST and provides methods to process this AST to extract the desired information. The output / return value is a JSON blob / JavaScript object. It provides a default implementation for React components defined via React.createClass, ES2015 class definitions or functions (stateless components). These component definitions...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ffsend

    ffsend

    Easily and securely share files from the command line

    Easily and securely share files and directories from the command line through a safe, private and encrypted link using a single simple command. Files are shared using the Send service and may be up to 1GB. Others are able to download these files with this tool, or through their web browser. All files are always encrypted on the client, and secrets are never shared with the remote host. An optional password may be specified, and a default file lifetime of 1 (up to 20) download or 24 hours is...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    PokeeResearch-7B

    PokeeResearch-7B

    Pokee Deep Research Model Open Source Repo

    PokeeResearchOSS provides an open-source, agentic “deep research” model centered on a 7B backbone that can browse, read, and synthesize current information from the web. Instead of relying only on static training data, the agent performs searches, visits pages, and extracts evidence before forming answers to complex queries. It is built to operate end-to-end: planning a research strategy, gathering sources, reasoning over conflicting claims, and writing a grounded response. The repository...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    UCO3D

    UCO3D

    Uncommon Objects in 3D dataset

    ...The repository includes automated downloaders with checksum verification, fine-grained controls to fetch only selected modalities or super-categories, and a lightweight Python API for loading frames, geometry, and splats on demand. Metadata is indexed in SQLite for quick queries at scale, and helper builders handle alignment, undistortion, frame extraction from videos, and cropping around the object.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Perception Models

    Perception Models

    State-of-the-art Image & Video CLIP, Multimodal Large Language Models

    Perception Models is a state-of-the-art framework developed by Facebook Research for advanced image and video perception tasks. It introduces two primary components: the Perception Encoder (PE) for visual feature extraction and the Perception Language Model (PLM) for multimodal decoding and reasoning. The PE module is a family of vision encoders designed to excel in image and video understanding, surpassing models like SigLIP2, InternVideo2, and DINOv2 across multiple benchmarks. Meanwhile, PLM integrates with PE to power vision-language modeling, achieving results competitive with leading multimodal systems such as QwenVL2.5 and InternVL3, all while being fully reproducible with open data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Down

    Down

    Streaming downloads using Net::HTTP, http.rb or HTTPX

    Down is a small, reliable Ruby library for downloading files that favors correctness, streaming, and clear error handling. It follows redirects safely, supports timeouts and retries, and streams responses to disk to keep memory usage low—ideal for large downloads or server environments. The API returns file-like objects (often Tempfile) with helpful metadata such as original filename and content type, which plays nicely with file-attachment libraries and background jobs. Multiple HTTP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Patch-NetVLAD

    Patch-NetVLAD

    Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition

    This repository contains code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition".
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    X-Ray of Death
    A professional PE (Portable Executable) analysis and modification tool for Windows executables and DLLs.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    gMKVExtractGUI

    gMKVExtractGUI

    A GUI in C# .NET 4 for mkvextract (MKVToolNix)

    The project has been moved in Github since v2.9.0 https://github.com/Gpower2/gMKVExtractGUI A GUI for mkvextract utility (part of MKVToolNix) which incorporates most (if not all) functionality of mkvextract and mkvinfo utilities. Written in C# .NET 4.0, in order to attain high compatibility with Windows OS (WinXP and newer Windows), as well as Linux through Mono (v1.6.4 and newer), and perhaps OSX (not tested).
    Leader badge
    Downloads: 1,213 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB