Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Director, Screenwriter, Producer, and Video Generator All-in-One
Multimodal-Driven Architecture for Customized Video Generation
3D reconstruction software
Advanced techniques for RAG systems
14-stage Fusion Pipeline for LLM token compression
Marrying Grounding DINO with Segment Anything & Stable Diffusion
VMZ: Model Zoo for Video Modeling
Omnilingual ASR Open-Source Multilingual SpeechRecognition
An implementation of a deep learning recommendation model (DLRM)
Implementation of Make-A-Video, new SOTA text to video generator
Lightweight anchor-free object detection model
Code release for "Masked-attention Mask Transformer
U-Net for RFI Detection based on @jakeret's implementation
Pattern recognition for ADL events