Official DeiT repository
Lets make video diffusion practical
PyTorch code and models for the DINOv2 self-supervised learning
FlashMLA: Efficient Multi-head Latent Attention Kernels
Encoder of greater-than-word length text trained on a variety of data
High-performance MoE model with MLA, MTP, and multilingual reasoning
Tiny pre-trained IBM model for multivariate time series forecasting