Showing 202 open source projects for "input-output model"

View related business solutions
  • Ship AI Apps Faster with Vertex AI Icon
    Ship AI Apps Faster with Vertex AI

    Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.

    Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
    Try Vertex AI Free
  • 99.99% Uptime for MySQL and PostgreSQL on Google Cloud Icon
    99.99% Uptime for MySQL and PostgreSQL on Google Cloud

    Enterprise Plus edition delivers sub-second maintenance downtime and 2x read/write performance. Built for critical apps.

    Cloud SQL Enterprise Plus gives you a 99.99% availability SLA with near-zero downtime maintenance—typically under 10 seconds. Get 2x better read/write performance, intelligent data caching, and 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server with built-in vector search for gen AI apps. New customers get $300 in free credit.
    Try Cloud SQL Free
  • 1
    MCP Shell Server

    MCP Shell Server

    Shell command execution server implementing the Model Context Protocol

    A secure shell command execution server implementing the Model Context Protocol (MCP), allowing remote execution of whitelisted shell commands with support for standard input. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    EduCDM

    EduCDM

    The Model Zoo of cognitive diagnosis models

    The Model Zoo of Cognitive Diagnosis Models, including classic Item Response Ranking (IRT), Multidimensional Item Response Ranking (MIRT), Deterministic Input, Noisy "And" model(DINA), and advanced Fuzzy Cognitive Diagnosis Framework (FuzzyCDF), Neural Cognitive Diagnosis Model (NCDM), Item Response Ranking framework (IRR), Incremental Cognitive Diagnosis (ICD) and Knowledge-association baesd extension of NeuralCD (KaNCD).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Bytewax

    Bytewax

    Python Stream Processing

    ...Connect data sources, run stateful transformations, and write to various downstream systems with built-in connectors or existing Python libraries. Bytewax is a Python framework and Rust distributed processing engine that uses a dataflow computational model to provide parallelizable stream processing and event processing capabilities similar to Flink, Spark, and Kafka Streams. You can use Bytewax for a variety of workloads from moving data à la Kafka Connect style all the way to advanced online machine learning workloads. Bytewax is not limited to streaming applications but excels anywhere that data can be distributed at the input and output.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Qwen-Audio

    Qwen-Audio

    Chat & pretrained large audio language model proposed by Alibaba Cloud

    Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 5
    Hunyuan3D 2.0

    Hunyuan3D 2.0

    High-Resolution 3D Assets Generation with Large Scale Diffusion Models

    ...Hunyuan3D-2 supports various enhancements and is available for deployment through tools like Blender and Hugging Face. Includes a user-friendly production/studio tool (Hunyuan3D-Studio) to manipulate/animate meshes. Condition-aligned shape generation via the DiT model, so generated mesh is influenced by input images or prompts.
    Downloads: 39 This Week
    Last Update:
    See Project
  • 6
    LatentSync

    LatentSync

    Taming Stable Diffusion for Lip Sync

    LatentSync is an open-source framework from ByteDance that produces high-quality lip-synchronization for video by using an audio-conditioned latent diffusion model, bypassing traditional intermediate motion representations. In effect, given a source video (with masked or reference frames) and an audio track, LatentSync directly generates frames whose lip motions and expressions align with the audio, producing convincing talking-head or animated lip-sync output. The system leverages a U-Net diffusion backbone, with cross-attention of audio embeddings (via an audio encoder) and reference video frames to guide generation, and applies a set of loss functions (temporal, perceptual, sync-net based) to enforce lip-sync accuracy, visual fidelity, and temporal consistency. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    SageMaker TensorFlow Training Toolkit

    SageMaker TensorFlow Training Toolkit

    Toolkit for running TensorFlow training scripts on SageMaker

    ...A Batch Transform job runs an offline-inference job using your TensorFlow Serving model. Input data in S3 is converted to HTTP requests, and responses are saved to an output bucket in S3.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    FLUX.2

    FLUX.2

    Official inference repo for FLUX.2 models

    FLUX.2 is a state-of-the-art open-weight image generation and editing model released by Black Forest Labs aimed at bridging the gap between research-grade capabilities and production-ready workflows. The model offers both text-to-image generation and powerful image editing, including editing of multiple reference images, with fidelity, consistency, and realism that push the limits of what open-source generative models have achieved.
    Downloads: 44 This Week
    Last Update:
    See Project
  • 9
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cut Cloud Costs with Google Compute Engine Icon
    Cut Cloud Costs with Google Compute Engine

    Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

    Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.
    Try Compute Engine
  • 10
    AlphaFold 3

    AlphaFold 3

    AlphaFold 3 inference pipeline

    AlphaFold 3, developed by Google DeepMind, is an advanced deep learning system for predicting biomolecular structures and interactions with exceptional accuracy. This repository provides the complete inference pipeline for running AlphaFold 3, though access to the model parameters is restricted and must be obtained directly from Google under specific terms of use. The system is designed for scientific research applications in structural biology, biochemistry, and bioinformatics, enabling accurate modeling of proteins, ligands, and covalent modifications. Users can perform local predictions via Docker containers, integrating AlphaFold 3’s inference process with provided JSON input configurations. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Qwen3-Omni

    Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM

    ...The model supports 119 text languages, 19 speech input languages, and 10 speech output languages. It achieves state-of-the-art results: across 36 audio and audio-visual benchmarks, it hits open-source SOTA on 32 and overall SOTA on 22, outperforming or matching strong closed-source models such as Gemini-2.5 Pro and GPT-4o. To reduce latency, especially in audio/video streaming, Talker predicts discrete speech codecs via a multi-codebook scheme and replaces heavier diffusion approaches.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Step-Audio

    Step-Audio

    Open-source framework for intelligent speech interaction

    Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and produces speech accordingly, enabling natural dialogue, voice cloning, and expressive speech synthesis. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    PersonaLive

    PersonaLive

    Expressive Portrait Image Animation for Live Streaming

    PersonaLive is an open-source diffusion-based portrait animation framework focused on generating expressive, long-duration animated sequences in real time, primarily for live streaming or interactive applications. It leverages deep generative models that condition on a static reference image and a driving input (such as motion or expression cues) to produce a seamless animated portrait sequence that can run indefinitely without segmentation artifacts. The framework prioritizes low-latency...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    ...It can handle microphone input, pre-recorded audio files, and network streams such as RTSP and HLS, making it flexible for live events, monitoring, or accessibility workflows. Configuration options let you control the number of clients, maximum connection time, and threading behavior so the server can be tuned for different deployment environments. On the client side, you can set the language, whether to translate into English, model size, voice activity detection, and output recording behavior.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    DeepSeek-V3.2-Exp

    DeepSeek-V3.2-Exp

    An experimental version of DeepSeek model

    DeepSeek-V3.2-Exp is an experimental release of the DeepSeek model family, intended as a stepping stone toward the next generation architecture. The key innovation in this version is DeepSeek Sparse Attention (DSA), a sparse attention mechanism that aims to optimize training and inference efficiency in long-context settings without degrading output quality. According to the authors, they aligned the training setup of V3.2-Exp with V3.1-Terminus so that benchmark results remain largely comparable, even though the internal attention mechanism changes. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 16
    SHAP

    SHAP

    A game theoretic approach to explain the output of ml models

    ...To understand how a single feature effects the output of the model we can plot the SHAP value of that feature vs. the value of the feature for all the examples in a dataset. Since SHAP values represent a feature's responsibility for a change in the model output, the plot below represents the change in predicted house price as RM (the average number of rooms per house in an area) changes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Text Generation Web UI

    Text Generation Web UI

    Oobabooga - The definitive Web UI for local AI, with powerful features

    ...LLaMA model, including 4-bit GPTQ. RWKV model, LoRA (loading and training), Softprompts, and extensions.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 18
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    ...PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Fun Audio Chat

    Fun Audio Chat

    Large Audio Language Model built for natural interactions

    Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. The system...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Raster Vision

    Raster Vision

    Open source framework for deep learning satellite and aerial imagery

    ...Raster Vision allows engineers to quickly and repeatably configure pipelines that go through core components of a machine learning workflow: analyzing training data, creating training chips, training models, creating predictions, evaluating models, and bundling the model files and configuration for easy deployment. The input to a Raster Vision pipeline is a set of images and training data, optionally with Areas of Interest (AOIs) that describe where the images are labeled. The output of a Raster Vision pipeline is a model bundle that allows you to easily utilize models in various deployment scenarios.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Weights and Biases

    Weights and Biases

    Tool for visualizing and tracking your machine learning experiments

    ...Set wandb.config once at the beginning of your script to save your hyperparameters, input settings (like dataset name or model type), and any other independent variables for your experiments. This is useful for analyzing your experiments and reproducing your work in the future. Setting configs also allows you to visualize the relationships between features of your model architecture or data pipeline and model performance.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    CogAgent

    CogAgent

    An open sourced end-to-end VLM-based GUI Agent

    CogAgent is a 9B-parameter bilingual vision-language GUI agent model based on GLM-4V-9B, trained with staged data curation, optimization, and strategy upgrades to improve perception, action prediction, and generalization across tasks. It focuses on operating real user interfaces from screenshots plus text, and follows a strict inputoutput format that returns structured actions, grounded operations, and optional sensitivity annotations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB