Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Chinese LLaMA & Alpaca large language model + local CPU/GPU training
Tensor search for humans
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
A high-performance ML model serving framework, offers dynamic batching
Private Open AI on Kubernetes
Low-latency REST API for serving text-embeddings
Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project