Port of Facebook's LLaMA model in C/C++
FlashMLA: Efficient Multi-head Latent Attention Kernels
Foundational Models for State-of-the-Art Speech and Text Translation
Open-source large language model family from Tencent Hunyuan
Clean and efficient FP8 GEMM kernels with fine-grained scaling