From Images to High-Fidelity 3D Assets
Awesome multilingual OCR toolkits based on PaddlePaddle
A Systematic Framework for Interactive World Modeling
DeepMind model for tracking arbitrary points across videos & robotics
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Qwen3-omni is a natively end-to-end, omni-modal LLM
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Learning to Act by Watching Unlabeled Online Videos
An implementation of model parallel GPT-2 and GPT-3-style models