FlashMLA: Efficient Multi-head Latent Attention Kernels
An experimental version of DeepSeek model
A Powerful Native Multimodal Model for Image Generation
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)