26m function call model that runs on incredibly small devices
Multimodal-Driven Architecture for Customized Video Generation
Video understanding codebase from FAIR for reproducing video models
Encoder of greater-than-word length text trained on a variety of data
Let us control diffusion models
Code release for ConvNeXt V2 model
Code for reproducing key results in the paper