Showing 58 open source projects for "output"

View related business solutions
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    FLUX.2

    FLUX.2

    Official inference repo for FLUX.2 models

    ...The model offers both text-to-image generation and powerful image editing, including editing of multiple reference images, with fidelity, consistency, and realism that push the limits of what open-source generative models have achieved. It supports high-resolution output (up to ~4 megapixels), which allows for photography-quality images, detailed product shots, infographics or UI mockups rather than just low-resolution drafts. FLUX.2 is built with a modern architecture (a flow-matching transformer + a revamped VAE + a strong vision-language encoder), enabling strong prompt adherence, correct rendering of text/typography in images, reliable lighting, layout, and physical realism, and consistent style/character/product identity across multiple generations or edits.
    Downloads: 33 This Week
    Last Update:
    See Project
  • 2
    Qwen3-Omni

    Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM

    ...It uses a Thinker-Talker architecture with a Mixture-of-Experts (MoE) design, early text-first pretraining, and mixed multimodal training to support strong performance across all modalities without sacrificing text or image quality. The model supports 119 text languages, 19 speech input languages, and 10 speech output languages. It achieves state-of-the-art results: across 36 audio and audio-visual benchmarks, it hits open-source SOTA on 32 and overall SOTA on 22, outperforming or matching strong closed-source models such as Gemini-2.5 Pro and GPT-4o. To reduce latency, especially in audio/video streaming, Talker predicts discrete speech codecs via a multi-codebook scheme and replaces heavier diffusion approaches.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    ...It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. Hybrid architecture combining multimodal transformer blocks and unimodal refinement blocks. Temporal alignment via frame-level synchronization modules (e.g. Synchformer).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    gpt-oss

    gpt-oss

    gpt-oss-120b and gpt-oss-20b are two open-weight language models

    gpt-oss is OpenAI’s open-weight family of large language models designed for powerful reasoning, agentic workflows, and versatile developer use cases. The series includes two main models: gpt-oss-120b, a 117-billion parameter model optimized for general-purpose, high-reasoning tasks that can run on a single H100 GPU, and gpt-oss-20b, a lighter 21-billion parameter model ideal for low-latency or specialized applications on smaller hardware. Both models use a native MXFP4 quantization for...
    Downloads: 97 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    OpenMythos

    OpenMythos

    A theoretical reconstruction of the Claude Mythos architecture

    ...The project explores the idea that instead of stacking hundreds of unique transformer layers, a smaller set of layers can be reused iteratively during inference to achieve deeper reasoning without increasing parameter count. It divides computation into three main stages, including a pre-processing phase, a looped recurrent reasoning block, and a final output refinement stage, creating a structured pipeline for inference. The architecture incorporates advanced techniques such as mixture-of-experts routing, adaptive computation time, and multiple attention mechanisms to dynamically allocate compute where needed. It is highly configurable through a centralized configuration system, allowing experimentation with different architectural parameters such as loop depth, attention type.
    Downloads: 31 This Week
    Last Update:
    See Project
  • 6
    Easy Diffusion

    Easy Diffusion

    An easy 1-click way to create beautiful artwork on your PC using AI

    ...The project abstracts away environment setup, dependencies, and model installation — tasks that can be daunting to beginners — and instead lets users focus on creative experimentation with prompt phrasing, model parameters, and image output settings. Because it’s designed to be easy to install and use, EasyDiffusion’s interface includes options for queuing multiple jobs, applying modifiers like upscaling or face correction, and adjusting generation parameters like guidance scale and resolution.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 7
    OpenAI Harmony

    OpenAI Harmony

    Renderer for the harmony response format to be used with gpt-oss

    ...It defines a structured way for language models to produce outputs, including regular text, reasoning traces, tool calls, and structured data. By mimicking the OpenAI Responses API, Harmony provides developers with a familiar interface while enabling more advanced capabilities such as multiple output channels, instruction hierarchies, and tool namespaces. The format is essential for ensuring gpt-oss models operate correctly, as they are trained to rely on this structure for generating and organizing their responses. For users accessing gpt-oss through third-party providers like HuggingFace, Ollama, or vLLM, Harmony formatting is handled automatically, but developers building custom inference setups must implement it directly. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    FireRedTTS-2

    FireRedTTS-2

    Long-form streaming TTS system for multi-speaker dialogue generation

    ...It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like chatbots, podcasts, and applications where dynamic turn-taking between speakers is essential. FireRedTTS2 supports multilingual output and speaker flexibility, enabling scenarios that involve language switching, cross-lingual voice cloning, and expressive dialogue generation that maintains consistency over longer utterances.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Qwen3-TTS

    Qwen3-TTS

    Qwen3-TTS is an open-source series of TTS models

    ...Because it’s part of the broader Qwen ecosystem, it benefits from the model’s understanding of linguistic nuances, enabling more accurate pronunciation, prosody, and contextual delivery than many traditional TTS systems. Developers can customize voice output parameters like speed, pitch, and volume, and combine the TTS stack with other AI components.
    Downloads: 15 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 10
    Qwen3-Coder

    Qwen3-Coder

    Qwen3-Coder is the code version of Qwen3

    Qwen3-Coder is the latest and most powerful agentic code model developed by the Qwen team at Alibaba Cloud. Its flagship version, Qwen3-Coder-480B-A35B-Instruct, features a massive 480 billion-parameter Mixture-of-Experts architecture with 35 billion active parameters, delivering top-tier performance on coding and agentic tasks. This model sets new state-of-the-art benchmarks among open models for agentic coding, browser-use, and tool-use, matching performance comparable to leading models...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 11
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    ...It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. GLM-4-Voice builds upon the bilingual strengths of the GLM architecture, supporting both Chinese and English, and is designed to handle long-form conversations with context retention. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    ...The system introduces a multi-reward reinforcement learning framework that jointly optimizes for voice similarity, emotional expressiveness, pronunciation, and intelligibility, yielding output that can rival commercial options in naturalness and expressiveness. GLM-TTS also supports phoneme-level control and hybrid text + phoneme input, giving developers precise control over pronunciation critical for multilingual or polyphone­-rich languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DeepSeek-V3.2-Exp

    DeepSeek-V3.2-Exp

    An experimental version of DeepSeek model

    ...The key innovation in this version is DeepSeek Sparse Attention (DSA), a sparse attention mechanism that aims to optimize training and inference efficiency in long-context settings without degrading output quality. According to the authors, they aligned the training setup of V3.2-Exp with V3.1-Terminus so that benchmark results remain largely comparable, even though the internal attention mechanism changes. In public evaluations across a variety of reasoning, code, and question-answering benchmarks (e.g. MMLU, LiveCodeBench, AIME, Codeforces, etc.), V3.2-Exp shows performance very close to or in some cases matching that of V3.1-Terminus. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 14
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    ...It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice cloning — meaning it can mimic a target speaker’s voice from a short reference sample — making it versatile for multi-voice uses. Compared to many open-source TTS tools, IndexTTS emphasizes efficiency and controllability: it offers faster inference, simpler training pipelines, and controllable speech parameters (like duration, pitch, and prosody), which is critical for production use.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    LingBot-World

    LingBot-World

    Advancing Open-source World Models

    LingBot-World is an open-source, high-fidelity world simulator designed to advance the state of world models through video generation. Built on top of Wan2.2, it enables realistic, dynamic environment simulation across diverse styles, including real-world, scientific, and stylized domains. LingBot-World supports long-term temporal consistency, maintaining coherent scenes and interactions over minute-level horizons. With real-time interactivity and sub-second latency at 16 FPS, it is...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    Stable Diffusion WebUI Docker

    Stable Diffusion WebUI Docker

    Easy Docker setup for Stable Diffusion with user-friendly UI

    ...Users can choose which UI profile they want to run — for example, full feature AUTOMATIC1111, CPU-only automatic builds, or ComfyUI workflows — and launch them in a consistent, isolated container environment with automatic model and data caching. The project supports mounting data and output directories so generated images and configurations persist outside the container, and it lets developers customize UI behavior through Docker Compose override files.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Qwen3-ASR

    Qwen3-ASR

    Qwen3-ASR is an open-source series of ASR models

    Qwen3-ASR is an automatic speech recognition system in the QwenLM family, developed to convert spoken language into text with strong accuracy and real-time performance. As a specialized ASR variant of the broader Qwen language model ecosystem, it focuses on capturing reliable transcriptions from audio sources such as recordings, live streams, or conversational inputs while supporting low latency use cases. The architecture combines advanced neural acoustic modeling with context-aware...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    HY-World 2.0

    HY-World 2.0

    A Multi-Modal World Model for Reconstructing, Generating, Simulation

    HY-World 2.0 is a multi-modal world model framework for reconstructing, generating, and simulating navigable 3D worlds from diverse inputs. It accepts text prompts, single-view images, multi-view images, and videos, and produces 3D world representations rather than limiting output to flat video generation. For text and single-image inputs, it generates high-fidelity 3D Gaussian Splatting scenes through a multi-stage pipeline that includes panorama generation, trajectory planning, world expansion, and world composition. The system also improves reconstruction from multi-view images and video by upgrading its feed-forward 3D prediction components and its memory-aware view generation process. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Anthropic SDK Python

    Anthropic SDK Python

    Provides convenient access to the Anthropic REST API from any Python 3

    The anthropic-sdk-python repository is the official Python client library for interacting with the Anthropic (Claude) REST API. It is designed to provide a user-friendly, type-safe, and asynchronous/synchronous capable interface for making chat/completion requests to models like Claude. The library includes definitions for all request and response parameters using Python typed objects, automatically handles serialization and deserialization, and wraps HTTP logic (timeouts, retries, error...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    MiniMind-O

    MiniMind-O

    A 0.1B Omni model trained from scratch

    MiniMind-O is an educational open-source project for building a small end-to-end Omni model from scratch. It extends the MiniMind family by exploring a model that can handle text, audio, and image inputs while producing text and streaming speech outputs. The project is designed to make multimodal AI training more accessible by keeping the model size small enough for ordinary personal hardware. It includes both mini and full training data paths, allowing learners to run a complete workflow...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    ...Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and can output or act via tools seamlessly, bridging perception and execution. Its architecture supports a very large context window (on the order of 128K tokens during training), which lets it handle complex multimodal inputs like long documents, multi-page reports, or video transcripts, while maintaining coherence across extended content. In benchmarks and internal evaluations, GLM-4.6V achieves state-of-the-art (SoTA) performance among models of comparable parameter scale on multimodal reasoning.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    HY-MT

    HY-MT

    Hunyuan Translation Model Version 1.5

    ...It ships with both an 1.8 B parameter model and a larger 7 B model, the latter optimized not only for direct translation but also for formatted and contextualized output, allowing better handling of terminology and mixed-language content. The project emphasizes both speed and quality, with the smaller model able to be quantized and deployed on edge devices for real-time translation tasks without requiring large server infrastructure. Terminology intervention and contextual translation features give users control over how specific terms or styles are rendered, which is important for technical or domain-specific content.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DeepSeekMath-V2

    DeepSeekMath-V2

    Towards self-verifiable mathematical reasoning

    DeepSeekMath-V2 is a large-scale open-source AI model designed specifically for advanced mathematical reasoning, theorem proving, and rigorous proof verification. It’s built by DeepSeek as a successor to their earlier math-specialist models. Unlike general-purpose LLMs that might generate plausible-looking math but sometimes hallucinate or mishandle rigorous logic, Math-V2 is engineered to not only generate solutions but also self-verify them, meaning it examines the derivations, checks...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    HY-Motion 1.0

    HY-Motion 1.0

    HY-Motion model for 3D character animation generation

    HY-Motion 1.0 is an open-source, large-scale AI model suite developed by Tencent’s Hunyuan team that generates high-quality 3D human motion from simple text prompts, enabling the automatic production of fluid, diverse, and semantically accurate animations without manual keyframing or rigging. Built on advanced deep learning architectures that combine Diffusion Transformer (DiT) and flow matching techniques, HY-Motion scales these approaches to the billion-parameter level, resulting in strong...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB