Search Results for "gpu max performance" - Page 4

Sort By:

Showing 458 open source projects for "gpu max performance"

View related business solutions

Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
1

clip-retrieval

Easily compute clip embeddings and build a clip retrieval system

...It allows developers to compute embeddings for both images and text efficiently and then index them for fast similarity search across massive datasets. The system is optimized for performance and scalability, capable of processing tens or even hundreds of millions of embeddings using GPU acceleration. It includes components for inference, indexing, filtering, and serving results through APIs, making it a complete pipeline for building production-ready retrieval systems. The framework also supports querying by image, text, or embedding, enabling flexible use cases such as reverse image search or multimodal content discovery. ...

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
2

Insanely Fast Whisper

An opinionated CLI to transcribe Audio files w/ Whisper on-device

Insanely Fast Whisper is a high-performance command-line tool designed to dramatically accelerate speech-to-text transcription using OpenAI’s Whisper models on local hardware. It leverages modern optimizations such as batch processing, mixed precision, and advanced attention mechanisms like Flash Attention to significantly reduce inference time while maintaining high transcription accuracy.

Downloads: 3 This Week

Last Update: 2026-03-26
See Project
3

LibreHardwareMonitor

Monitor temperature sensors, fan speed, voltage, load & clock speeds

Libre Hardware Monitor is a free, open-source system monitoring tool that provides detailed insights into your computer’s hardware health and performance. It tracks real-time metrics such as temperatures, fan speeds, voltages, clock speeds, and load across a wide range of components. The project includes both a Windows Forms application for visual monitoring and a reusable library for developers who want to integrate hardware monitoring into their own software. LibreHardwareMonitor supports modern Intel and AMD CPUs, major GPU vendors, storage devices, and network adapters. ...

Downloads: 258 This Week

Last Update: 2026-02-14
See Project
4

Ultralight

Lightweight, high-performance HTML renderer for game developers

...Available for desktop apps, game consoles, TVs, embedded device displays, servers, and more. Official API for C and C++, with bindings for more. Render web-content on the GPU via Direct3D, Metal, OpenGL, or your own engine for unmatched visual performance. Render web-content on the CPU via SIMD/parallel for incredibly easy integration with any environment (including server-side!). Ultralight is engineered for peak performance, ensuring minimal CPU and memory usage. Customize low-level platform functionality, integrate JavaScript directly with native code, dive deep into performance tuning, and more. ...

Downloads: 2 This Week

Last Update: 2024-06-12
See Project
Add Two Lines of Code. Get Full APM.
AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.

Start Free
5

DXVK

Vulkan-based implementation of D3D9, D3D10 and D3D11 for Linux / Wine

...Direct3D is a graphics application programming interface built for Windows and is used for rendering three-dimensional graphics in applications. It is typically useful in applications where performance is vital, such as in three-dimensional games. This project aims to provide support for Direct3D11, feature level 11_1, and Direct3D10, feature level 10_1. Currently however, there are still a few unsupported features, such as shared resources, predication, class linkage and target-independent rasterization. To get the best results out of this project, it is recommended that you use an esync-enabled Wine build to reduce CPU overhead in some games, and to disable desktop effects on your compositor, as this can cause stuttering issues when games are GPU-bound.

Downloads: 393 This Week

Last Update: 2025-10-11
See Project
6

OpenVINO AI Plugins for Audacity

A set of AI-enabled effects, generators, and analyzers for Audacity

A set of AI-enabled effects, generators, and analyzers for Audacity. These AI features run 100% locally on your PC, no internet connection is necessary. OpenVINO™ is used to run AI models on supported accelerators found on the user's system such as CPU, GPU, and NPU.

Downloads: 111 This Week

Last Update: 2024-12-20
See Project
7

Faster Whisper

Faster Whisper transcription with CTranslate2

Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...

Downloads: 28 This Week

Last Update: 2026-04-06
See Project
8

EvoTrees.jl

Boosted trees in Julia

A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram-based algorithms with support for multiple loss functions, including various regressions, multi-classification and Gaussian max likelihood.

Downloads: 0 This Week

Last Update: 2026-02-24
See Project
9

Kubeflow Trainer

Distributed AI Model Training and LLM Fine-Tuning on Kubernetes

...One of its key innovations is the integration of MPI-based distributed computing within Kubernetes, allowing efficient communication between nodes for high-performance training. It also includes advanced scheduling capabilities through integrations with tools like Kueue and Volcano, enabling topology-aware resource allocation and multi-cluster job orchestration.

Downloads: 2 This Week

Last Update: 2026-03-20
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

TensorRT

C++ library for high performance inference on NVIDIA GPUs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers,...

Downloads: 18 This Week

Last Update: 2026-03-25
See Project
11

Skiko

Kotlin Multiplatform bindings to Skia

...By leveraging Skia’s proven performance and cross-platform consistency, Skiko helps developers write a single graphics pipeline that behaves predictably across environments, simplifying maintenance and reducing platform fragmentation.

Downloads: 26 This Week

Last Update: 4 days ago
See Project
12

G-Helper

Lightweight Armoury Crate alternative for Asus laptops and ROG Ally

Small and lightweight Armoury Crate alternative for Asus laptops offering almost same functionality without extra load and unnecessary services. Works with all popular models, such as ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, Flow Z13, DUO, TUF Series, Strix or Scar Series, ProArt, Vivobook, Zenbook, ROG Ally or Ally X and many more.

Downloads: 176 This Week

Last Update: 2026-04-22
See Project
13

UIforETW

User interface for recording and managing ETW traces

UIforETW is a Windows performance tracing companion that wraps the Event Tracing for Windows (ETW) toolchain in an approachable GUI. It standardizes trace collection profiles, launches WPR/xperf with the right providers, and organizes the resulting .etl files for repeatable investigations. The tool streamlines the entire loop—record, annotate, open in WPA/XperfView—so engineers can focus on finding scheduling stalls, I/O bottlenecks, GC pauses, or GPU hitches instead of memorizing command-line incantations. ...

Downloads: 0 This Week

Last Update: 2025-10-10
See Project
14

FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style.

Downloads: 0 This Week

Last Update: 2026-04-29
See Project
15

three-d

Makes it simple to draw stuff across platforms (including web)

three-d is a lightweight and modern 3D rendering library written in Rust that targets both native and WebAssembly environments, providing a simple yet powerful abstraction over GPU-based graphics APIs. It is designed to make 3D graphics programming accessible while still offering fine-grained control over rendering pipelines, materials, lighting, and camera systems. The library leverages modern graphics standards such as OpenGL and WebGL to deliver high-performance rendering across platforms, including browsers and desktop applications. ...

Downloads: 0 This Week

Last Update: 2026-04-17
See Project
16

Habitat-Sim

A flexible, high-performance 3D simulator for Embodied AI research

Habitat-Sim is a high-performance 3D simulator for embodied AI research, designed to run photorealistic indoor environments at thousands of frames per second. It offers GPU-accelerated rendering and a flexible sensor suite—RGB, depth, semantic segmentation, and more—so agents can perceive and act in realistic scenes. The engine is written in C++ with Python bindings and integrates physics, navigation meshes, and shortest-path planners to support tasks like point-goal navigation, rearrangement, and interactive manipulation. ...

Downloads: 1 This Week

Last Update: 2025-10-07
See Project
17

Colossal-AI

Making large AI models cheaper, faster and more accessible

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine.

Downloads: 1 This Week

Last Update: 2025-05-28
See Project
18

Zoo Design Studio

The Zoo Design Studio app

...Users can interact with the system through a familiar point-and-click interface, but every action is translated into code in the underlying modeling language, ensuring consistency between visual and programmatic representations. The application is powered by a GPU-first geometry engine that streams rendered output as video frames, enabling high-performance modeling even when heavy computation is offloaded to remote infrastructure. It uses WebSockets for real-time communication between the client and the modeling engine, allowing immediate feedback and interactive design updates.

Downloads: 9 This Week

Last Update: 3 days ago
See Project
19

Lemonade

Lemonade helps users run local LLMs with the highest performance

Lemonade is a local LLM runtime that aims to deliver the highest possible performance on your own hardware by auto-configuring state-of-the-art inference engines for both NPUs and GPUs. The project positions itself as a “local LLM server” you can run on laptops and workstations, abstracting away backend differences while giving you a single place to serve and manage models. Its README emphasizes real-world adoption across startups, research groups, and large companies, signaling a focus on...

Downloads: 13 This Week

Last Update: 15 hours ago
See Project
20

RL Games

RL implementations

rl_games is a high-performance reinforcement learning framework optimized for GPU-based training, particularly in environments like robotics and continuous control tasks. It supports advanced algorithms and is built with PyTorch.

Downloads: 0 This Week

Last Update: 2026-02-20
See Project
21

DALI

A GPU-accelerated library containing highly optimized building blocks

...Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference. DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline.

Downloads: 1 This Week

Last Update: 2026-04-16
See Project
22

Infinity

Low-latency REST API for serving text-embeddings

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under MIT License. Infinity powers inference behind Gradient.ai and other Embedding API providers.

Downloads: 0 This Week

Last Update: 2025-08-22
See Project
23

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model

...The project provides inference code, demos (command line, web, API), quantization support for lower memory deployment, and tools for finetuning (e.g., via P-Tuning v2). It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference (INT4, INT8) to reduce GPU memory requirements. Automatic mode switching between precision/memory tradeoffs (full/quantized).

Downloads: 7 This Week

Last Update: 2025-09-26
See Project
24

Spartan Engine

A game engine with an emphasis on real-time cutting-edge solutions

...The engine implements a wide range of advanced graphics features, such as atmospheric scattering, physically based shading, screen-space shadows and ambient occlusion, screen-space reflections, sophisticated shadow mapping, volumetric fog, and HDR output. It supports next-gen performance and image quality technologies including variable rate shading, dynamic resolution scaling, temporal anti-aliasing, and upscaling via XeSS 2 and FSR 3. Beyond rendering, SpartanEngine offers PhysX-powered physics, CPU and GPU profiling, a thread pool for parallel workloads.

Downloads: 7 This Week

Last Update: 1 day ago
See Project
25

DirectX-Graphics-Samples

Samples that demonstrate how to build graphics intensive applications

This repo contains the DirectX 12 Graphics samples that demonstrate how to build graphics-intensive applications for Windows 10. In the Samples directory, you will find samples that attempt to break off specific features and specific usage scenarios into bite-sized chunks. For example, the ExecuteIndirect sample will show you just enough about execute indirect to get started with that feature without diving too deep into multiengine whereas the nBodyGravity sample will delve into multiengine...

Downloads: 44 This Week

Last Update: 2026-01-22
See Project