Showing 803 open source projects for "aosp-project-mido"

View related business solutions
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    promptmap2

    promptmap2

    A security scanner for custom LLM applications

    promptmap is an automated security scanner for custom LLM applications that focuses on prompt injection and related attack classes. The project supports both white-box and black-box testing, which means it can either run tests directly against a known model and system prompt configuration or attack an external HTTP endpoint without internal access. Its scanning workflow uses a dual-LLM architecture in which one model acts as the target being tested and another acts as a controller that evaluates whether an attack succeeded. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    NVIDIA Generative AI Examples

    NVIDIA Generative AI Examples

    Generative AI reference workflows

    NVIDIA GenerativeAIExamples is an open-source repository that provides practical reference implementations and example workflows for building generative AI applications using NVIDIA’s software ecosystem. The project is designed to help developers accelerate the development of AI applications by providing ready-to-run pipelines, notebooks, and tools that demonstrate how to integrate large language models into real-world systems. The repository includes examples covering topics such as retrieval-augmented generation pipelines, agent-based workflows, and multimodal AI applications that combine text, vision, and data processing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ERNIE

    ERNIE

    The official repository for ERNIE 4.5 and ERNIEKit

    ...It supports both full-parameter training and parameter-efficient approaches so teams can choose between maximum quality and lower-cost adaptation depending on their constraints. The project also emphasizes optimization techniques for large-scale training, including mixed-precision and hybrid-parallel strategies that are commonly needed for multi-node GPU clusters. In addition to training, it includes guidance and example materials intended to help developers adopt ERNIE models for real product scenarios rather than only research demonstrations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Continuous Claude v3

    Continuous Claude v3

    Context management for Claude Code. Hooks maintain state via ledgers

    ...Rather than relying on a single session’s context, Continuous Claude uses mechanisms like ledgers, YAML handoffs, and a memory system to preserve and recall state across multiple sessions, ensuring that learned insights and plans are not lost when context compaction occurs. The project orchestrates many specialized agents and skills—109 skills and 32 agents—so that complex coding tasks can be broken down, analyzed, and executed collaboratively by different components. It also includes a layered code analysis pipeline to reduce token usage and maintain relevant context efficiently. This continuous learning environment enables workflows such as bug fixing, refactoring, planning, and exploratory investigation while minimizing the need to re-explain context manually.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    MAI-UI

    MAI-UI

    Real-World Centric Foundation GUI Agents

    MAI-UI is a cutting-edge open-source project that implements a family of foundation GUI (Graphical User Interface) agent models capable of interpreting natural language and performing real-world GUI navigation and control tasks across mobile and desktop environments. Developed by Tongyi-MAI (Alibaba’s research initiative), the MAI-UI models are multimodal agents trained to understand user instructions and corresponding screenshots, grounding those instructions to on-screen elements and generating sequences of GUI actions such as taps, swipes, text input, and system commands. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    rLLM

    rLLM

    Democratizing Reinforcement Learning for LLMs

    ...With rLLM, developers can define custom “agents” and “environments,” and then train those agents via reinforcement learning workflows, possibly surpassing what vanilla fine-tuning or supervised learning might provide. The project is designed to support large-scale language models (including support for big models via integrated training backends), making it relevant for state-of-the-art research and production use. The framework includes tools for defining workflows, specifying objectives or reward functions, and managing training/policy updates across possibly distributed settings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    BlogWizard

    BlogWizard

    Generate blog articles from video or audio

    BlogWizard is a demo/utility project built on top of Groq’s LLM infrastructure that converts video or audio content into well-structured blog posts, enabling creators to repurpose multimedia content into text — useful for SEO, accessibility, or reaching audiences that prefer reading. The tool uses transcription (e.g. via Whisper) to extract text from audio/video, then runs an LLM-based generation pipeline to transform that content into coherent, readable blog-format posts — with sections, formatting, and possibly metadata. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    USO

    USO

    Open-sourced unified customization model

    ...By decoupling style and subject, USO enables reuse of learned style/style-embeddings across different subjects, or vice versa, which makes generation more modular and controllable. The project provides tooling (in Python) including inference and workflow scripts, example configurations, and support for generation pipelines; and as of 2025, USO is also natively supported in some mainstream generative-art UI pipelines (e.g. ComfyUI) to ease adoption.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Poetiq

    Poetiq

    Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1

    poetiq-arc-agi-solver is the open-source codebase from Poetiq that replicates their record-breaking submission to the challenging benchmark suite ARC-AGI (both ARC-AGI-1 and ARC-AGI-2). The project demonstrates a system that orchestrates large language models (LLMs) — like those from major providers — with carefully engineered prompting, reasoning workflows, and dynamic strategies, to tackle the abstract, logic-heavy problems in ARC-AGI. Instead of relying on a single prompt or fixed strategy, their solver dynamically adapts the reasoning path, selecting what to ask or analyze next depending on intermediate results — effectively compositing reasoning, perception, and program synthesis (or symbolic manipulation) in a loop. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    Meta-World

    Meta-World

    Collections of robotics environments

    ...The environments adhere to the Gymnasium API, which makes them easy to plug into existing RL pipelines, and they support both synchronous and asynchronous vectorized execution for running many environments in parallel. Installation is done via pip, with official support for Python versions 3.8 through 3.11 on Linux and macOS, and the project is licensed under MIT to encourage broad academic and industry use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Tracking Any Point (TAP)

    Tracking Any Point (TAP)

    DeepMind model for tracking arbitrary points across videos & robotics

    TAPNet is the official Google DeepMind repository for Tracking Any Point (TAP), bundling datasets, models, benchmarks, and demos for precise point tracking in videos. The project includes the TAP-Vid and TAPVid-3D benchmarks, which evaluate long-range tracking of arbitrary points in 2D and 3D across diverse real and synthetic videos. Its flagship models—TAPIR, BootsTAPIR, and the latest TAPNext—use matching plus temporal refinement or next-token style propagation to achieve state-of-the-art accuracy and speed on TAP-Vid. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Perception Models

    Perception Models

    State-of-the-art Image & Video CLIP, Multimodal Large Language Models

    ...Meanwhile, PLM integrates with PE to power vision-language modeling, achieving results competitive with leading multimodal systems such as QwenVL2.5 and InternVL3, all while being fully reproducible with open data. The project supports a wide range of research applications, from visual recognition and dense prediction to fine-grained multimodal understanding. Additionally, it includes several large-scale open datasets for both image and video perception.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SuggestArr

    SuggestArr

    Request recommended movies, TV shows and anime to Jellyseer/Overseer

    SuggestArr is an open-source automation platform designed to recommend and automatically request movies, TV shows, and anime based on a user’s viewing history in self-hosted media servers. The project integrates with popular media management systems such as Jellyfin, Plex, and Emby, allowing it to analyze recently watched content and identify similar titles using metadata from the TMDb database. Once potential recommendations are identified, SuggestArr can automatically send download or request instructions to services like Jellyseer or Overseerr, which then coordinate with media download tools and libraries. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    1D Visual Tokenization and Generation

    1D Visual Tokenization and Generation

    This repo contains the code for 1D tokenizer and generator

    The 1D Visual Tokenization and Generation project from ByteDance introduces a novel “one-dimensional” tokenizer designed for images: instead of representing images with large grids of 2D tokens (as in many prior generative/image-modeling systems), it compresses images into as few as 32 discrete tokens (or more, optionally) — thereby achieving a very compact, efficient representation that drastically speeds up generation and reconstruction while retaining strong fidelity.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Bailing

    Bailing

    Bailing is a voice dialogue robot similar to GPT-4o

    Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users. The project is modular: each core function — ASR, VAD, LLM, TTS — exists as a separately replaceable component, which allows flexibility in picking your preferred models depending on resources or languages. It aims to be light enough to run without a GPU, making it usable on modest hardware or edge devices, while still maintaining low latency and smooth interaction. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    shuyuan

    shuyuan

    Reading book source

    shuyuan is a project oriented around reading and knowledge consumption, especially targeting large-scale text content such as books, articles, or educational material. The name suggests “academy” or “study hall,” and the tool aims to help users ingest, organize, and manage reading content — possibly offering features like text parsing, annotation, metadata generation, translation, or storage for later reference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    ...It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face Transformers, ExLlamaV2, VLLM and a JavaScript interface via Transformers.js, allowing it to run on CPUs, NVIDIA CUDA GPUs, AMD ROCm, Vulkan-capable GPUs, and Apple Metal. It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Code-Mode

    Code-Mode

    Plug-and-play library to enable agents to call MCP and UTCP tools

    Code-Mode is a plug-and-play library that lets AI agents call tools by executing TypeScript (or via a Python wrapper) instead of making many individual function calls. Its core philosophy is that language models are very good at writing code, so rather than exposing hundreds of separate tool endpoints, you give the model a single “code execution” tool that has access to your full toolkit through code. This approach can dramatically reduce the number of tool-call iterations needed in complex...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Gemini Fullstack LangGraph Quickstart

    Gemini Fullstack LangGraph Quickstart

    Get started w/ building Fullstack Agents using Gemini 2.5 & LangGraph

    gemini-fullstack-langgraph-quickstart is a fullstack reference application from Google DeepMind’s Gemini team that demonstrates how to build a research-augmented conversational AI system using LangGraph and Google Gemini models. The project features a React (Vite) frontend and a LangGraph/FastAPI backend designed to work together seamlessly for real-time research and reasoning tasks. The backend agent dynamically generates search queries based on user input, retrieves information via the Google Search API, and performs reflective reasoning to identify knowledge gaps. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    GLM-4.1V

    GLM-4.1V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    ...It represents a trade-off: somewhat reduced capacity compared to 4.5V or 4.6V, but with benefits in terms of speed, deployability, and lower hardware requirements — making it especially useful for developers experimenting locally, building lightweight agents, or deploying on limited infrastructure. Given its open-source availability under the same project repository, it provides an accessible entry point for testing multimodal reasoning and building proof-of-concept applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Step1X-3D

    Step1X-3D

    High-Fidelity and Controllable Generation of Textured 3D Assets

    ...The result is fully 3D assets — meshes + textures — which can be rendered from any viewpoint, textured consistently, and used in 3D applications. To achieve this, the project includes a massive curated dataset: among more than 5 million candidate 3D assets, it filters and standardizes to produce a high-quality 2 million–asset subset suitable for training.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    WhisperSpeech

    WhisperSpeech

    An Open Source text-to-speech system built by inverting Whisper

    WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS: Whisper is used to produce semantic tokens, EnCodec compresses the waveform into acoustic tokens, and Vocos reconstructs high-fidelity audio from those tokens. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    ...It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research preview, and includes data-processing scripts so users can train or finetune their own variants. Inference is provided through a Python package that uses vLLM under the hood for high-throughput, low-latency generation, including streaming examples that show how to generate audio chunks in real time. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    VibeVoice ComfyUI

    VibeVoice ComfyUI

    ComfyUI integration for Microsoft's VibeVoice text-to-speech model

    ...It includes advanced control over generation parameters like attention backend, diffusion steps, sampling temperature, guidance scale, and quantization settings, allowing users to tune the trade-offs between quality, VRAM usage, and speed. The project also introduces first-class LoRA support, making it possible to fine-tune and load custom LoRA adapters that modify voice identity or style while keeping the base VibeVoice model intact.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    GELab-Zero

    GELab-Zero

    GUI Exploration Lab. One of the best GUI agent solutions

    ...The idea is to let developers or users harness an AI agent that can simulate clicking, typing, reading UI elements, and interacting with apps in a human-like way via the GUI, which can enable tasks like automated testing, scriptable workflows, or even autonomous usage of GUI-based applications. Because GELab-Zero is fully open-source and doesn’t require external services, it offers privacy and control: everything runs locally under your control. The project provides a lightweight base model (4B parameters in its public release) that can run on modest hardware (depending on quantization), making it more accessible than many large-scale AI solutions.
    Downloads: 0 This Week
    Last Update:
    See Project