Port of Facebook's LLaMA model in C/C++
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Sharp Monocular Metric Depth in Less Than a Second
Open-source large language model family from Tencent Hunyuan
Foundational Models for State-of-the-Art Speech and Text Translation
FlashMLA: Efficient Multi-head Latent Attention Kernels
Code release for "Masked-attention Mask Transformer