Running large language models on a single GPU
Frame profiler
Lightweight generic ring buffer manager library
Real-time NVIDIA GPU dashboard
A high-quality rapid TTS voice cloning model
A process for exposing JMX Beans via HTTP for Prometheus consumption
Personal Information “Leakage ” Detection Interface
Python-free Rust inference server
AirLLM 70B inference with single 4GB GPU
Lightweight Java library developed by Alibaba for reading and writing
Building an Intelligent Agent from Scratch
An implementation of OpenGL 3.x-ish in clean C
A cycle-accurate Nintendo Game Boy Advance emulator
Provides a way to profile code
A TTS that fits in your CPU (and pocket)
Realtime log viewer for containers. Supports Docker, Swarm and K8s
Official inference framework for 1-bit LLMs
MemU is an open-source memory framework for AI companions
Library for the numerical simulation of closed as well as open quantum
Unified KV Cache Compression Methods for Auto-Regressive Models
Neural Network architecture based on ideas of the original LSTM
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Docker management you will like
Open-source large language model family from Tencent Hunyuan
Demo of a customer service use case implemented with the OpenAI Agents