Search Results for "gpu max performance" - Page 9

Sort By:

Showing 458 open source projects for "gpu max performance"

View related business solutions

Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
1

tt-metal

TT-NN operator library, and TT-Metalium low level kernel programming

tt-metal, also referred to in its documentation as TT-Metalium, is Tenstorrent’s low-level software development kit for programming applications on Tenstorrent AI accelerators. The project is designed for developers who need direct access to the company’s Tensix processor architecture, exposing a programming model that is closer to hardware control than high-level inference frameworks. Instead of following a traditional GPU model centered on massive thread parallelism, the platform is built...

Downloads: 2 This Week

Last Update: 4 days ago
See Project
2

CogAgent

An open sourced end-to-end VLM-based GUI Agent

CogAgent is a 9B-parameter bilingual vision-language GUI agent model based on GLM-4V-9B, trained with staged data curation, optimization, and strategy upgrades to improve perception, action prediction, and generalization across tasks. It focuses on operating real user interfaces from screenshots plus text, and follows a strict input–output format that returns structured actions, grounded operations, and optional sensitivity annotations. The model is designed for agent-style execution rather...

Downloads: 4 This Week

Last Update: 5 days ago
See Project
3

Transformer Engine

A library for accelerating Transformer models on NVIDIA GPUs

Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference. TE provides a collection of highly optimized building blocks for popular Transformer architectures and an automatic mixed precision-like API that can be used seamlessly with your framework-specific code. TE also includes a framework-agnostic C++...

Downloads: 2 This Week

Last Update: 2026-04-24
See Project
4

Isaac ROS Visual SLAM

Visual SLAM/odometry package based on NVIDIA-accelerated cuVSLAM

Discover a faster, easier way to build advanced AI robotics applications with the NVIDIA Isaac™ ROS collection of accelerated computing packages and AI models, bringing NVIDIA acceleration to ROS developers everywhere. Isaac ROS Visual SLAM provides a high-performance, best-in-class ROS 2 package for VSLAM (visual simultaneous localization and mapping). This package uses one or more stereo cameras and optionally an IMU to estimate odometry as an input to navigation. It is GPU-accelerated to provide real-time, low-latency results in a robotics application. VSLAM provides an additional odometry source for mobile robots (ground-based) and can be the primary odometry source for drones. ...

Downloads: 2 This Week

Last Update: 2026-05-01
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
5

Qwen-VL

Chat & pretrained large vision language model

Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. Qwen-VL supports multilingual inputs...

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
6

Diffrax

Numerical differential equation solvers in JAX

Diffrax is a numerical differential equation solving library built for the JAX ecosystem, with a strong focus on composability, differentiability, and high-performance scientific computing. The project provides tools for solving ordinary differential equations, stochastic differential equations, controlled differential equations, and related systems in a way that fits naturally into modern machine learning and differentiable programming workflows. Because it is written to work closely with...

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
7

Recursive Language Models

General plug-and-play inference library for Recursive Language Models

RLM (short for Reinforcement Learning Models) is a modular framework that makes it easier to build, train, evaluate, and deploy reinforcement learning (RL) agents across a wide range of environments and tasks. It provides a consistent API that abstracts away many of the repetitive engineering patterns in RL research and application work, letting developers focus on modeling, experimentation, and fine-tuning rather than infrastructure plumbing. Within the framework, you can define custom...

Downloads: 0 This Week

Last Update: 2026-02-18
See Project
8

EPLB

Expert Parallelism Load Balancer

EPLB is DeepSeek’s open implementation of a load balancing algorithm designed for expert parallelism (EP) settings in MoE architectures. In EP, different “experts” are mapped to different GPUs or nodes, so load imbalance becomes a performance bottleneck if certain experts are invoked much more often. EPLB solves this by duplicating heavily used experts (redundancy) and then placing those duplicates across GPUs to even out computational load. It uses policies like hierarchical load balancing (grouped experts placed at node and then GPU level) and global load balancing depending on configuration. ...

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
9

Core ML Tools

Core ML tools contain supporting tools for Core ML model conversion

...Core ML provides a unified representation for all models. Your app uses Core ML APIs and user data to make predictions, and to fine-tune models, all on the user’s device. Core ML optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine while minimizing its memory footprint and power consumption. Running a model strictly on the user’s device removes any need for a network connection, which helps keep the user’s data private and your app responsive.

Downloads: 0 This Week

Last Update: 2025-11-10
See Project
Earn up to 16% annual interest with Nexo.
Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
10

Covalent workflow

Pythonic tool for running machine-learning/high performance workflows

Covalent is a Pythonic workflow tool for computational scientists, AI/ML software engineers, and anyone who needs to run experiments on limited or expensive computing resources including quantum computers, HPC clusters, GPU arrays, and cloud services. Covalent enables a researcher to run computation tasks on an advanced hardware platform – such as a quantum computer or serverless HPC cluster – using a single line of code. Covalent overcomes computational and operational challenges inherent...

Downloads: 0 This Week

Last Update: 2026-04-23
See Project
11

Numba

NumPy aware dynamic Python compiler using LLVM

Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN. You don't need to replace the Python interpreter, run a separate compilation step, or even have a C/C++ compiler installed. Just apply one of the Numba decorators to your...

Downloads: 0 This Week

Last Update: 2026-04-23
See Project
12

Diplomacy Cicero

Code for Cicero, an AI agent that plays the game of Diplomacy

...It supports two variants: Cicero (which handles full “press” negotiation) and Diplodocus (a variant focused on no-press diplomacy) as described in the README. The codebase is implemented primarily in Python with performance-critical components in C++ (via pybind11 bindings) and is configured to run in a high‐GPU cluster environment. Configuration is managed via protobuf files to define tasks such as self-play, benchmark agent comparisons, and RL training. The project is now archived and read-only, reflecting that it is no longer actively developed but remains publicly available for research use.

Downloads: 3 This Week

Last Update: 3 days ago
See Project
13

MSI Kombustor

Advanced OpenGL and Vulkan graphics card stress testing utility

MSI Kombustor is a dedicated GPU stress-testing and benchmarking tool built on top of the popular FurMark engine. It is designed to push graphics cards to their thermal and stability limits, helping users verify cooling performance and overclocking reliability. With support for advanced 3D APIs like OpenGL and Vulkan, Kombustor can generate demanding rendering workloads that simulate real-world GPU pressure.

1 Review

Downloads: 84 This Week

Last Update: 2025-11-22
See Project
14

Tracking Any Point (TAP)

DeepMind model for tracking arbitrary points across videos & robotics

TAPNet is the official Google DeepMind repository for Tracking Any Point (TAP), bundling datasets, models, benchmarks, and demos for precise point tracking in videos. The project includes the TAP-Vid and TAPVid-3D benchmarks, which evaluate long-range tracking of arbitrary points in 2D and 3D across diverse real and synthetic videos. Its flagship models—TAPIR, BootsTAPIR, and the latest TAPNext—use matching plus temporal refinement or next-token style propagation to achieve state-of-the-art...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
15

TorchRec

Pytorch domain library for recommendation systems

...The TorchRec sharder can shard embedding tables with different sharding strategies including data-parallel, table-wise, row-wise, table-wise-row-wise, and column-wise sharding. The TorchRec planner can automatically generate optimized sharding plans for models. Pipelined training overlaps dataloading device transfer (copy to GPU), inter-device communications (input_dist), and computation (forward, backward) for increased performance. Optimized kernels for RecSys powered by FBGEMM. Quantization support for reduced precision training and inference. Common modules for RecSys.

Downloads: 1 This Week

Last Update: 2026-03-15
See Project
16

xLUA

xLua is a lua programming solution for C#

...Keep this directory structure and put it in your Unity project. A complete example only requires 3 lines of code. It is recommended to bind once and reuse it. If the code is generated, calling max will not generate gc alloc.

Downloads: 0 This Week

Last Update: 2025-09-11
See Project
17

Colab-MCP

An MCP server for interacting with Google Colab

...Instead of relying on manual notebook usage, the system allows MCP-compatible agents to execute code, manage files, install dependencies, and orchestrate entire development workflows within Colab’s cloud infrastructure. This approach bridges the gap between local AI agents and remote high-performance compute environments, allowing users to offload heavy workloads such as machine learning training, data analysis, and dependency-heavy tasks to Colab’s GPU and TPU resources. By exposing Colab as an MCP server, the tool enables seamless integration with a wide range of AI assistants and agent frameworks, creating a standardized interface for tool use and execution.

Downloads: 0 This Week

Last Update: 2026-03-27
See Project
18

TAME LLM

Traditional Mandarin LLMs for Taiwan

TAME LLM is an open-source initiative focused on building and releasing large language models optimized for Traditional Mandarin and the linguistic context of Taiwan. The project includes models such as Llama-3-Taiwan-70B, which are fine-tuned versions of large transformer architectures trained on extensive corpora containing both Traditional Mandarin and English text. These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
19

Profile Data

Analyze computation-communication overlap in V3/R1

profile-data is a repository that publishes profiling traces and metrics from DeepSeek’s training and inference infrastructure (especially during DeepSeek-V3 / R1 experiments). The profiling data targets insights into computation-communication overlap, pipeline scheduling (e.g. DualPipe), and how MoE / EP / parallelism strategies interact in real systems. The repository contains JSON trace files like train.json, prefill.json, decode.json, and associated assets. Users can load them into tools...

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
20

TensorFlow Model Garden

Models and examples built with TensorFlow

The TensorFlow Model Garden is a repository with a number of different implementations of state-of-the-art (SOTA) models and modeling solutions for TensorFlow users. We aim to demonstrate the best practices for modeling so that TensorFlow users can take full advantage of TensorFlow for their research and product development. To improve the transparency and reproducibility of our models, training logs on TensorBoard.dev are also provided for models to the extent possible though not all models...

Downloads: 0 This Week

Last Update: 2026-02-11
See Project
21

tvm

Open deep learning compiler stack for cpu, gpu, etc.

Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend. The vision of the Apache TVM Project is to host a diverse community of experts and practitioners in machine learning, compilers, and systems architecture to build an accessible, extensible, and automated open-source framework that optimizes current and emerging...

Downloads: 0 This Week

Last Update: 2026-02-01
See Project
22

MSI Afterburner

MSI Afterburner: Overclock, monitor, and optimize your GPU.

MSI Afterburner is the world's most recognized and widely used graphics card overclocking utility. It provides a comprehensive suite of tools for gamers and PC enthusiasts to enhance their graphics card's performance and monitor its health. With MSI Afterburner, users can overclock their GPU to achieve higher frame rates in games, undervolt to reduce power consumption and heat, and customize fan curves for optimal cooling. The software also includes an on-screen display (OSD) feature that overlays real-time statistics directly onto games, allowing users to keep track of their system's performance without minimizing their applications. ...

Downloads: 52 This Week

Last Update: 2025-07-13
See Project
23

LÖVR

Lua Virtual Reality engine

An open-source framework for rapidly building immersive 3D experiences. You can use LÖVR to easily create VR experiences without much setup or programming experience. The framework is tiny, fast, open-source, and supports lots of different platforms and devices. Runs on Windows, Mac, Linux, Android, WebXR. Supports Vive/Index, Oculus Rift/Quest, Pico, Windows MR, and has a VR simulator. Simple VR scenes can be created in just a few lines of Lua. Writen in C99 and scripted with LuaJIT,...

Downloads: 1 This Week

Last Update: 2025-02-15
See Project
24

Deep Java Library (DJL)

An engine-agnostic deep learning framework in Java

...Because DJL is deep learning engine agnostic, you don't have to make a choice between engines when creating your projects. You can switch engines at any point. To ensure the best performance, DJL also provides automatic CPU/GPU choice based on hardware configuration.

1 Review

Downloads: 2 This Week

Last Update: 2025-12-15
See Project
25

The SpeechBrain Toolkit

A PyTorch-based Speech Toolkit

...Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well. SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction.

Downloads: 0 This Week

Last Update: 2026-03-30
See Project