A high-performance ML model serving framework, offers dynamic batching
Models and examples built with TensorFlow
Running large language models on a single GPU
Chinese Llama-3 LLMs) developed from Meta Llama 3
Supercharge Your LLM with the Fastest KV Cache Layer
A set of Docker images for training and serving models in TensorFlow
Easily turn large sets of image urls to an image dataset
AlphaFold 3 inference pipeline
Tensor Learning in Python
Official inference framework for 1-bit LLMs
Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs
Claude Code skill that researches any topic across Reddit + X
Open source AI VTuber platform with voice chat and Live2D avatars
Library for OCR-related tasks powered by Deep Learning
Fast State-of-the-Art Static Embeddings
SGLang is a fast serving framework for large language models
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Z80-μLM is a 2-bit quantized language model
The largest collection of PyTorch image encoders / backbones
A simple native web interface that uses ChatTTS to synthesize text
Standardized Serverless ML Inference Platform on Kubernetes
Multilingual Automatic Speech Recognition with word-level timestamps