LLM training in simple, raw C/CUDA
Implementation of "MobileCLIP" CVPR 2024
Expert Parallelism Load Balancer
Instructions on how to use the Realtime API on Microcontrollers
Open-Source Low-Latency Accelerated Linux WebRTC HTML5 Remote Desktop
Distributed parallelization of stencil-based GPU and CPU applications
Component for React
A fast and robust web server and application server for Ruby
A painless self-hosted Git service
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Open-source Python framework for hybrid quantum-classical ml learning
Local RAG engine for private multimodal knowledge search on devices
UCCL is an efficient communication library for GPUs
Real-time NVIDIA GPU dashboard
A simple, performant and scalable Jax LLM
Implementation for MatMul-free LM
Run PyTorch LLMs locally on servers, desktop and mobile
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
The official MongoDB Rust Driver
New set of lightweight state-of-the-art, open foundation models
Official implementation of DreamCraft3D
The book "Performance Analysis and Tuning on Modern CPU"
An open-source & self-hostable Heroku / Netlify / Vercel alternative
CUDA programming in Julia
Pipy is a programmable proxy for the cloud, edge and IoT