RGBD video generation model conditioned on camera input
LTX-Video Support for ComfyUI
Python inference and LoRA trainer package for the LTX-2 audio–video
Lets make video diffusion practical
Recovering the Visual Space from Any Views
GPT4V-level open-source multi-modal model based on Llama3-8B
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
OCR expert VLM powered by Hunyuan's native multimodal architecture