Wan2.2: Open and Advanced Large-Scale Video Generative Model
RGBD video generation model conditioned on camera input
Generating Immersive, Explorable, and Interactive 3D Worlds
Chat & pretrained large audio language model proposed by Alibaba Cloud
A Powerful Native Multimodal Model for Image Generation
A Unified Framework for Text-to-3D and Image-to-3D Generation
Inference framework for 1-bit LLMs
A Customizable Image-to-Video Model based on HunyuanVideo
Multimodal-Driven Architecture for Customized Video Generation
Personalize Any Characters with a Scalable Diffusion Transformer
Implementation of model parallel autoregressive transformers on GPUs
Tencent’s 36-language state-of-the-art translation model
Speaker segmentation model for 10s audio chunks with powerset labels