RGBD video generation model conditioned on camera input
A Powerful Native Multimodal Model for Image Generation
A Unified Framework for Text-to-3D and Image-to-3D Generation
A Customizable Image-to-Video Model based on HunyuanVideo
Multimodal-Driven Architecture for Customized Video Generation
Implementation of model parallel autoregressive transformers on GPUs
Tencent’s 36-language state-of-the-art translation model
Mirror of Ultralytics YOLO-World model weights for object detection
Speaker segmentation model for 10s audio chunks with powerset labels
Detects speech activity in audio using pyannote.audio 2.1 pipeline