RGBD video generation model conditioned on camera input
Capable of understanding text, audio, vision, video
Code for running inference and finetuning with SAM 3 model
Benchmark LLMs by fighting in Street Fighter 3
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
OCR expert VLM powered by Hunyuan's native multimodal architecture
A Pioneering Open-Source Alternative to GPT-4o
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Winning Solution in NTIRE19 Challenges on Video Restoration
Starter code for working with the YouTube-8M dataset
Convolutional neural network model for video classification
Non-local Neural Networks for Video Classification