Search Results for "gpu max performance" - Page 4

Showing 145 open source projects for "gpu max performance"

View related business solutions
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Diffrax

    Diffrax

    Numerical differential equation solvers in JAX

    Diffrax is a numerical differential equation solving library built for the JAX ecosystem, with a strong focus on composability, differentiability, and high-performance scientific computing. The project provides tools for solving ordinary differential equations, stochastic differential equations, controlled differential equations, and related systems in a way that fits naturally into modern machine learning and differentiable programming workflows. Because it is written to work closely with...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Recursive Language Models

    Recursive Language Models

    General plug-and-play inference library for Recursive Language Models

    RLM (short for Reinforcement Learning Models) is a modular framework that makes it easier to build, train, evaluate, and deploy reinforcement learning (RL) agents across a wide range of environments and tasks. It provides a consistent API that abstracts away many of the repetitive engineering patterns in RL research and application work, letting developers focus on modeling, experimentation, and fine-tuning rather than infrastructure plumbing. Within the framework, you can define custom...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    EPLB

    EPLB

    Expert Parallelism Load Balancer

    EPLB is DeepSeek’s open implementation of a load balancing algorithm designed for expert parallelism (EP) settings in MoE architectures. In EP, different “experts” are mapped to different GPUs or nodes, so load imbalance becomes a performance bottleneck if certain experts are invoked much more often. EPLB solves this by duplicating heavily used experts (redundancy) and then placing those duplicates across GPUs to even out computational load. It uses policies like hierarchical load balancing (grouped experts placed at node and then GPU level) and global load balancing depending on configuration. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Core ML Tools

    Core ML Tools

    Core ML tools contain supporting tools for Core ML model conversion

    ...Core ML provides a unified representation for all models. Your app uses Core ML APIs and user data to make predictions, and to fine-tune models, all on the user’s device. Core ML optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine while minimizing its memory footprint and power consumption. Running a model strictly on the user’s device removes any need for a network connection, which helps keep the user’s data private and your app responsive.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    Covalent workflow

    Covalent workflow

    Pythonic tool for running machine-learning/high performance workflows

    Covalent is a Pythonic workflow tool for computational scientists, AI/ML software engineers, and anyone who needs to run experiments on limited or expensive computing resources including quantum computers, HPC clusters, GPU arrays, and cloud services. Covalent enables a researcher to run computation tasks on an advanced hardware platform – such as a quantum computer or serverless HPC cluster – using a single line of code. Covalent overcomes computational and operational challenges inherent...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    BentoML

    BentoML

    Unified Model Serving Framework

    ...Parallelize compute-intense model inference workloads to scale separately from the serving logic. Adaptive batching dynamically groups inference requests for optimal performance. Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes. Easily configure CUDA dependencies for running inference with GPU. Automatically generate docker images for production deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Numba

    Numba

    NumPy aware dynamic Python compiler using LLVM

    Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN. You don't need to replace the Python interpreter, run a separate compilation step, or even have a C/C++ compiler installed. Just apply one of the Numba decorators to your...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. Qwen-VL supports multilingual inputs...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Diplomacy Cicero

    Diplomacy Cicero

    Code for Cicero, an AI agent that plays the game of Diplomacy

    ...It supports two variants: Cicero (which handles full “press” negotiation) and Diplodocus (a variant focused on no-press diplomacy) as described in the README. The codebase is implemented primarily in Python with performance-critical components in C++ (via pybind11 bindings) and is configured to run in a high‐GPU cluster environment. Configuration is managed via protobuf files to define tasks such as self-play, benchmark agent comparisons, and RL training. The project is now archived and read-only, reflecting that it is no longer actively developed but remains publicly available for research use.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 10
    TorchRec

    TorchRec

    Pytorch domain library for recommendation systems

    ...The TorchRec sharder can shard embedding tables with different sharding strategies including data-parallel, table-wise, row-wise, table-wise-row-wise, and column-wise sharding. The TorchRec planner can automatically generate optimized sharding plans for models. Pipelined training overlaps dataloading device transfer (copy to GPU), inter-device communications (input_dist), and computation (forward, backward) for increased performance. Optimized kernels for RecSys powered by FBGEMM. Quantization support for reduced precision training and inference. Common modules for RecSys.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    The SpeechBrain Toolkit

    The SpeechBrain Toolkit

    A PyTorch-based Speech Toolkit

    ...Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well. SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Colab-MCP

    Colab-MCP

    An MCP server for interacting with Google Colab

    ...Instead of relying on manual notebook usage, the system allows MCP-compatible agents to execute code, manage files, install dependencies, and orchestrate entire development workflows within Colab’s cloud infrastructure. This approach bridges the gap between local AI agents and remote high-performance compute environments, allowing users to offload heavy workloads such as machine learning training, data analysis, and dependency-heavy tasks to Colab’s GPU and TPU resources. By exposing Colab as an MCP server, the tool enables seamless integration with a wide range of AI assistants and agent frameworks, creating a standardized interface for tool use and execution.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    TAME LLM

    TAME LLM

    Traditional Mandarin LLMs for Taiwan

    TAME LLM is an open-source initiative focused on building and releasing large language models optimized for Traditional Mandarin and the linguistic context of Taiwan. The project includes models such as Llama-3-Taiwan-70B, which are fine-tuned versions of large transformer architectures trained on extensive corpora containing both Traditional Mandarin and English text. These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Transformer Engine

    Transformer Engine

    A library for accelerating Transformer models on NVIDIA GPUs

    Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference. TE provides a collection of highly optimized building blocks for popular Transformer architectures and an automatic mixed precision-like API that can be used seamlessly with your framework-specific code. TE also includes a framework-agnostic C++...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    TensorFlow Model Garden

    TensorFlow Model Garden

    Models and examples built with TensorFlow

    The TensorFlow Model Garden is a repository with a number of different implementations of state-of-the-art (SOTA) models and modeling solutions for TensorFlow users. We aim to demonstrate the best practices for modeling so that TensorFlow users can take full advantage of TensorFlow for their research and product development. To improve the transparency and reproducibility of our models, training logs on TensorBoard.dev are also provided for models to the extent possible though not all models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    tvm

    tvm

    Open deep learning compiler stack for cpu, gpu, etc.

    Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend. The vision of the Apache TVM Project is to host a diverse community of experts and practitioners in machine learning, compilers, and systems architecture to build an accessible, extensible, and automated open-source framework that optimizes current and emerging...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Tracking Any Point (TAP)

    Tracking Any Point (TAP)

    DeepMind model for tracking arbitrary points across videos & robotics

    TAPNet is the official Google DeepMind repository for Tracking Any Point (TAP), bundling datasets, models, benchmarks, and demos for precise point tracking in videos. The project includes the TAP-Vid and TAPVid-3D benchmarks, which evaluate long-range tracking of arbitrary points in 2D and 3D across diverse real and synthetic videos. Its flagship models—TAPIR, BootsTAPIR, and the latest TAPNext—use matching plus temporal refinement or next-token style propagation to achieve state-of-the-art...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    ...It provides both a graphical user interface and API access, making it suitable for casual users as well as developers who want to integrate voice transformation into their own applications. The project also supports GPU acceleration, enabling faster inference and smoother real-time performance on compatible hardware. Additionally, it includes tools for training and managing voice models, giving users the ability to create personalized voice profiles.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 19
    fastai

    fastai

    Deep learning library

    fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    DeepSpeed

    DeepSpeed

    Deep learning optimization library: makes distributed training easy

    DeepSpeed is an easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference. With DeepSpeed you can: 1. Train/Inference dense or sparse models with billions or trillions of parameters 2. Achieve excellent system throughput and efficiently scale to thousands of GPUs 3. Train/Inference on resource constrained GPU systems 4. Achieve unprecedented low latency and high throughput for inference 5. Achieve extreme...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework developed by Tencent, extending the capabilities of HunyuanVideo. It allows for high-quality video creation from still images, using PyTorch and providing pre-trained model weights, inference code, and customizable training options. The system includes a LoRA training code for adding special effects and enhancing video realism, aiming to offer versatile and scalable solutions for generating videos from static image inputs.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    Warlock-Studio

    Warlock-Studio

    AI Suite for upscaling, interpolating & restoring images/videos

    v6.0. Warlock-Studio is a Windows application that uses Real-ESRGAN, BSRGAN, IRCNN, GFPGAN, RealESRNet, RealESRAnime and RIFE Artificial Intelligence models to upscale, restore faces, interpolate frames and reduce noise in images and videos. the application supports GPU acceleration (including multi-GPU setups) and offers batch processing for large workloads. It includes drag-and-drop handling for single or multiple files, optional pre-resize functions, and an automatic tiling system...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 23
    Grok-1

    Grok-1

    Open-source, high-performance Mixture-of-Experts large language model

    Grok-1 is a 314-billion-parameter Mixture-of-Experts (MoE) large language model developed by xAI. Designed to optimize computational efficiency, it activates only 25% of its weights for each input token. In March 2024, xAI released Grok-1's model weights and architecture under the Apache 2.0 license, making them openly accessible to developers. The accompanying GitHub repository provides JAX example code for loading and running the model. Due to its substantial size, utilizing Grok-1...
    Leader badge
    Downloads: 43 This Week
    Last Update:
    See Project
  • 24
    Xfl

    Xfl

    An Efficient and Easy-to-use Federated Learning Framework

    XFL is a lightweight, high-performance federated learning framework supporting both horizontal and vertical FL. It integrates homomorphic encryption, DP, secure MPC, and optimizes network resilience. Compatible with major ML libraries and deployable via Docker or Conda.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PC_Workman_HCK

    PC_Workman_HCK

    AI-powered PC monitoring that explains. Not shows numbers/spikes.

    PC_Workman is what 680 hours of coding after warehouse shifts looks like. Built on a laptop hitting 94°C, this AI-powered monitoring tool does what Task Manager can't: it understands your system, not just measures it. Features: - Time travel monitoring - debug issues from hours ago - AI diagnostics with HCK_GPT - Custom fan curves with profiles - Floating always-on-top widget - 2D system map - Cross-GPU support (NVIDIA/AMD/Intel) Four complete rebuilds. 29 features killed....
    Downloads: 6 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB