Showing 19 open source projects for "throughput"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    MiMo-V2-Flash

    MiMo-V2-Flash

    MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation

    ...It uses an MoE setup where a very large total parameter count is available, but only a smaller subset is activated per token, which helps balance capability with runtime efficiency. The project positions the model for workflows that require tool use, multi-step planning, and higher throughput, rather than only single-turn chat. Architecturally, it highlights attention and prediction choices aimed at accelerating generation while preserving instruction-following quality in complex prompts. The repository typically serves as a launch point for running the model, understanding its intended use cases, and reproducing or extending its evaluation on reasoning and agent-style tasks. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    FramePack

    FramePack

    Lets make video diffusion practical

    FramePack explores compact representations for sequences of image frames, targeting tasks where many near-duplicate frames carry redundant information. The idea is to “pack” frames by detecting shared structure and storing differences efficiently, which can accelerate training or inference on video-like data. By reducing I/O and memory bandwidth, datasets become lighter to load while models still see the essential temporal variation. The repository demonstrates both packing and unpacking...
    Downloads: 44 This Week
    Last Update:
    See Project
  • 3
    FlashMLA

    FlashMLA

    FlashMLA: Efficient Multi-head Latent Attention Kernels

    FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style. The library supports both BF16 and FP16 data types, and includes a paged KV cache implementation with a block size of 64 to efficiently manage memory during decoding. On very compute-bound settings, it can reach up to ~660 TFLOPS on H800 SXM5 hardware, while in memory-bound configurations it can push memory throughput to ~3000 GB/s. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Tencent-Hunyuan-Large

    Tencent-Hunyuan-Large

    Open-source large language model family from Tencent Hunyuan

    Tencent-Hunyuan-Large is the flagship open-source large language model family from Tencent Hunyuan, offering both pre-trained and instruct (fine-tuned) variants. It is designed with long-context capabilities, quantization support, and high performance on benchmarks across general reasoning, mathematics, language understanding, and Chinese / multilingual tasks. It aims to provide competitive capability with efficient deployment and inference. FP8 quantization support to reduce memory usage...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 5
    DFlash

    DFlash

    Block Diffusion for Ultra-Fast Speculative Decoding

    DFlash is an open-source framework for ultra-fast speculative decoding using a lightweight block diffusion model to draft text in parallel with a target large language model, dramatically improving inference speed without sacrificing generation quality. It acts as a “drafter” that proposes likely continuations which the main model then verifies, enabling significant throughput gains compared to traditional autoregressive decoding methods that generate token by token. This approach has been shown to deliver lossless acceleration on models like Qwen3-8B by combining block diffusion techniques with efficient batching, making it ideal for applications where latency and cost matter. The project includes support for multiple draft models, example integration code, and scripts to benchmark performance, and it is structured to work with popular model serving stacks like SGLang and the Hugging Face Transformers ecosystem.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Stable Diffusion WebUI Forge

    Stable Diffusion WebUI Forge

    Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion

    Stable Diffusion WebUI Forge is a performance- and feature-oriented fork of the popular AUTOMATIC1111 interface that experiments with new backends, memory optimizations, and UX improvements. It targets heavy users and researchers who push large models, control nets, and high-resolution pipelines where default settings can become bottlenecks. The fork typically introduces toggles for scheduler behavior, attention implementations, caching, and precision modes to reach better speed or quality...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B),...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    fairseq2

    fairseq2

    FAIR Sequence Modeling Toolkit 2

    fairseq2 is a modern, modular sequence modeling framework developed by Meta AI Research as a complete redesign of the original fairseq library. Built from the ground up for scalability, composability, and research flexibility, fairseq2 supports a broad range of language, speech, and multimodal content generation tasks, including instruction fine-tuning, reinforcement learning from human feedback (RLHF), and large-scale multilingual modeling. Unlike the original fairseq—which evolved into a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    xFormers

    xFormers

    Hackable and optimized Transformers building blocks

    xformers is a modular, performance-oriented library of transformer building blocks, designed to allow researchers and engineers to compose, experiment, and optimize transformer architectures more flexibly than monolithic frameworks. It abstracts components like attention layers, feedforward modules, normalization, and positional encoding, so you can mix and match or swap optimized kernels easily. One of its key goals is efficient attention: it supports dense, sparse, low-rank, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    Ring

    Ring

    Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI

    Ring is a reasoning Mixture-of-Experts (MoE) large language model (LLM) developed by inclusionAI. It is built from or derived from Ling. Its design emphasizes reasoning, efficiency, and modular expert activation. In its “flash” variant (Ring-flash-2.0), it optimizes inference by activating only a subset of experts. It applies reinforcement learning/reasoning optimization techniques. Its architectures and training approaches are tuned to enable efficient and capable reasoning performance....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    ...MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel strategies such as LASP+, varlen ring attention, and Expert Tensor Parallelism, enabling a training context of 1 million tokens and up to 4 million tokens at inference. MiniMax-VL-01 extends this core by adding a 303M-parameter Vision Transformer and a two-layer MLP projector in a ViT–MLP–LLM framework, allowing the model to process images at dynamic resolutions up to 2016×2016.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    DeiT (Data-efficient Image Transformers)
    ...Its key idea is a specialized distillation strategy—including a learnable “distillation token”—that lets a transformer learn effectively from a CNN or transformer teacher on modest-scale datasets. The project provides compact ViT variants (Tiny/Small/Base) that achieve excellent accuracy–throughput trade-offs, making transformers practical beyond massive pretraining regimes. Training involves carefully tuned augmentations, regularization, and optimization schedules to stabilize learning and improve sample efficiency. The repo offers pretrained checkpoints, reference scripts, and ablation studies that clarify which ingredients matter most for data-efficient ViT training.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    FastViT

    FastViT

    This repository contains the official implementation of research

    FastViT is an efficient vision backbone family that blends convolutional inductive biases with transformer capacity to deliver strong accuracy at mobile and real-time inference budgets. Its design pursues a favorable latency-accuracy Pareto curve, targeting edge devices and server scenarios where throughput and tail latency matter. The models use lightweight attention and carefully engineered blocks to minimize token mixing costs while preserving representation power. Training and inference recipes highlight straightforward integration into common vision tasks such as classification, detection, and segmentation. The codebase provides reference implementations and checkpoints that make it easy to evaluate or fine-tune on downstream datasets. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Apple Neural Engine (ANE) Transformers

    Apple Neural Engine (ANE) Transformers

    Reference implementation of the Transformer architecture optimized

    ...The repository targets practitioners who want to keep familiar PyTorch modeling while preparing models for Core ML/ANE execution paths. Documentation highlights reported improvements in throughput and memory residency, while releases track incremental fixes and packaging updates. The project sits alongside related Apple ML repos that focus on deploying attention-based models efficiently to ANE-equipped hardware. In short, it’s a practical blueprint for adapting Transformers to Apple’s dedicated ML accelerator without rewriting entire model stacks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    PyTorch-BigGraph

    PyTorch-BigGraph

    Generate embeddings from large-scale graph-structured data

    ...PBG supports multi-relation graphs (knowledge graphs) with relation-specific scoring functions, negative sampling strategies, and typed entities, making it suitable for link prediction and retrieval. Its training loop is built for throughput: asynchronous I/O, memory-mapped tensors, and lock-free updates keep GPUs and CPUs fed even at extreme scale. The toolkit includes evaluation metrics and export tools so learned embeddings can be used in downstream nearest-neighbor search, recommendation, or analytics. In practice, PBG’s design lets practitioners train high-quality graph embeddings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Mistral Small 4

    Mistral Small 4

    Model that fuses instruct, reasoning and agentic skills

    The Mistral Small 4 collection is a set of open-weight large language models developed by Mistral AI that aim to unify multiple capabilities, including instruction following, reasoning, and coding, within a single efficient architecture. These models are part of the broader Mistral Small family, which is designed to deliver strong performance across a wide range of everyday AI tasks while maintaining relatively low latency and efficient deployment requirements. The collection reflects an...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Nemotron 3 Nano

    Nemotron 3 Nano

    LL model providing reasoning and conversational capabilities

    ...It is trained from scratch and built using a hybrid architecture that integrates Transformer attention layers with Mamba-style sequence modeling components inside a Mixture-of-Experts framework. This architecture allows the system to maintain strong reasoning capabilities while improving throughput and reducing the computational cost associated with large context processing. The model is designed as a general-purpose language system capable of handling tasks such as chat interaction, coding assistance, document analysis, and instruction following.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Nemotron 3 Super

    Nemotron 3 Super

    Open language model developed by NVIDIA as part of Nemotron-3 family

    NVIDIA-Nemotron-3-Super-120B-A12B-FP8 is a large-scale open language model developed by NVIDIA as part of the Nemotron-3 family of generative AI systems designed for advanced reasoning, conversational interaction, and agent-based workflows. The model contains approximately 120 billion parameters, but employs a Mixture-of-Experts architecture that activates only a smaller subset of parameters during inference, improving computational efficiency while maintaining high capability. Its...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    GigaChat 3 Ultra

    GigaChat 3 Ultra

    High-performance MoE model with MLA, MTP, and multilingual reasoning

    GigaChat 3 Ultra is a flagship instruct-model built on a custom Mixture-of-Experts architecture with 702B total and 36B active parameters. It leverages Multi-head Latent Attention to compress the KV cache into latent vectors, dramatically reducing memory demand and improving inference speed at scale. The model also employs Multi-Token Prediction, enabling multi-step token generation in a single pass for up to 40% faster output through speculative and parallel decoding techniques. Its...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB