Showing 114 open source projects for "inference"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Metaseq

    Metaseq

    Repo for external large-scale work

    ...It supports both pretraining and fine-tuning workflows with data pipelines for text, multilingual corpora, and custom tokenization schemes. Metaseq also includes APIs for evaluation, generation, and model serving, enabling seamless transitions from training to inference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    ToMe (Token Merging)

    ToMe (Token Merging)

    A method to increase the speed and lower the memory footprint

    ...This approach differs from token pruning, which removes background tokens entirely; instead, ToMe merges tokens based on feature similarity, allowing it to compress both foreground and background information efficiently. ToMe integrates seamlessly into existing transformer models such as DeiT, MAE, SWAG, and timm ViTs, offering 2–3x speedups during inference and substantial efficiency gains during training. The method can be applied dynamically at inference time or incorporated into training for improved performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    GPT-NeoX

    GPT-NeoX

    Implementation of model parallel autoregressive transformers on GPUs

    ...For those looking for a TPU-centric codebase, we recommend Mesh Transformer JAX. If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face transformers library instead which supports GPT-NeoX models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    minGPT

    minGPT

    A minimal PyTorch re-implementation of the OpenAI GPT

    ...It strips away extraneous bells and whistles, aiming to show how a sequence of token indices is fed into a stack of transformer blocks and then decoded into the next token probabilities, with both training and inference supported. Because the whole model is around 300 lines of code, users can follow each step—from embedding lookup, positional encodings, multi-head attention, feed-forward layers, to output heads—and thus demystify how GPT-style models work beneath the surface. It provides a practical sandbox for experimentation, letting learners tweak the architecture, dataset, or training loop without being overwhelmed by framework abstraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    Stable Diffusion

    Stable Diffusion

    A latent text-to-image diffusion model

    Stable Diffusion is a widely used open-source latent text-to-image diffusion model developed by the CompVis group for generating high-quality images from natural language prompts. The model operates by conditioning a diffusion process on text embeddings produced by a CLIP text encoder, enabling detailed and controllable image synthesis. It was trained on large-scale image datasets and later fine-tuned to produce 512×512 images with strong visual fidelity. Because the system runs efficiently...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 6
    Video Pre-Training

    Video Pre-Training

    Learning to Act by Watching Unlabeled Online Videos

    ...The idea is to learn general priors of control from large-scale, unlabeled video data, and then optionally fine-tune those priors for more goal-directed behavior via environment interaction. The repository contains demonstration models of different widths, fine-tuned variants (e.g. for building houses or early-game tasks), and inference scripts that instantiate agents from pretrained weights. Key modules include the behavioral cloning logic, the agent wrapper, and data loading pipelines (with an accessible skeleton for loading Minecraft demonstration data). The repo also includes a run_agent.py script for testing an agent interactively, and an agent.py module encapsulating the control logic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Mask2Former

    Mask2Former

    Code release for "Masked-attention Mask Transformer

    ...The project provides extensive configurations and pretrained models across popular benchmarks like COCO, ADE20K, and Cityscapes. Built on top of Detectron2, it includes training scripts, inference tools, and visualization utilities that make experimentation straightforward.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    GPT Neo

    GPT Neo

    An implementation of model parallel GPT-2 and GPT-3-style models

    An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here to play with our pre-trained models, we strongly recommend you try out the HuggingFace Transformer integration. Training and inference is officially supported on TPU and should work on GPU as well. This repository will be (mostly) archived as we move focus to our GPU-specific repo, GPT-NeoX. NB, while neo can technically run a training step at 200B+ parameters, it is very inefficient at those scales. This, as well as the fact that many GPUs became available to us, among other things, prompted us to move development over to GPT-NeoX. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Denoiser

    Denoiser

    Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)

    ...The implementation includes data augmentation techniques applied to the raw waveforms (e.g. noise mixing, reverberation) to improve model robustness and generalization to diverse noise types. The project supports both offline denoising (batch inference) and live audio processing (e.g. via loopback audio interfaces), making it practical for real-time use in calls or recording. The codebase includes training and evaluation scripts, configuration management via Hydra, and pretrained models on standard noise datasets.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    Nemotron 3 Super

    Nemotron 3 Super

    Open language model developed by NVIDIA as part of Nemotron-3 family

    NVIDIA-Nemotron-3-Super-120B-A12B-FP8 is a large-scale open language model developed by NVIDIA as part of the Nemotron-3 family of generative AI systems designed for advanced reasoning, conversational interaction, and agent-based workflows. The model contains approximately 120 billion parameters, but employs a Mixture-of-Experts architecture that activates only a smaller subset of parameters during inference, improving computational efficiency while maintaining high capability. Its architecture combines Transformer attention layers with Mamba state-space components to balance long-context reasoning, memory efficiency, and high-quality language generation. The model is optimized for building AI agents that must perform complex tasks such as planning, tool usage, coding assistance, and multi-step reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Mistral Small 4

    Mistral Small 4

    Model that fuses instruct, reasoning and agentic skills

    ...These models are part of the broader Mistral Small family, which is designed to deliver strong performance across a wide range of everyday AI tasks while maintaining relatively low latency and efficient deployment requirements. The collection reflects an evolution toward hybrid mixture-of-experts architectures that dynamically activate subsets of parameters during inference, allowing large models to remain computationally efficient. Mistral Small 4 models are built to handle tasks such as conversational AI, software development assistance, and reasoning-heavy problem solving, making them versatile tools for both developers and enterprise applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    DeepSeek-V3.2

    DeepSeek-V3.2

    High-efficiency reasoning and agentic intelligence model

    DeepSeek-V3.2 is a cutting-edge large language model developed by DeepSeek-AI, focused on achieving high reasoning accuracy and computational efficiency for agentic tasks. It introduces DeepSeek Sparse Attention (DSA), a new attention mechanism that dramatically reduces computational overhead while maintaining strong long-context performance. Built with a scalable reinforcement learning framework, it reaches near-GPT-5 levels of reasoning and outperforms comparable models like DeepSeek-V3.1...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Mellum-4b-base

    Mellum-4b-base

    JetBrains’ 4B parameter code model for completions

    ...With a context window of 8,192 tokens, it excels at code completion, fill-in-the-middle tasks, and intelligent code suggestions for professional developer tools and IDEs. The model is efficient for both cloud inference with vLLM and local deployment using llama.cpp or Ollama, thanks to its bf16 precision and AMP training. While the base model is not fine-tuned for downstream tasks, it is designed to be easily adapted through supervised fine-tuning (SFT) or reinforcement learning (RL). Benchmarks on RepoBench, SAFIM, and HumanEval demonstrate its competitive performance, with specialized fine-tuned versions for Python already showing strong improvements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Hunyuan-MT-7B

    Hunyuan-MT-7B

    Tencent’s 36-language state-of-the-art translation model

    Hunyuan-MT-7B is a large-scale multilingual translation model developed by Tencent, designed to deliver state-of-the-art translation quality across 36 languages, including several Chinese ethnic minority languages. It forms part of the Hunyuan Translation Model family, alongside Hunyuan-MT-Chimera, which ensembles outputs for even higher accuracy. Trained with a comprehensive framework spanning pretraining, cross-lingual pretraining, supervised fine-tuning, enhancement, and ensemble...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB