CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Clean and efficient FP8 GEMM kernels with fine-grained scaling
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
FAIR Sequence Modeling Toolkit 2
FlashMLA: Efficient Multi-head Latent Attention Kernels
Hackable and optimized Transformers building blocks
Learning embeddings for classification, retrieval and ranking