Wan2.2: Open and Advanced Large-Scale Video Generative Model
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Open-source deep-learning framework
Collection of Gemma 3 variants that are trained for performance
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Large-language-model & vision-language-model based on Linear Attention
Large Multimodal Models for Video Understanding and Editing
Capable of understanding text, audio, vision, video
Towards Real-World Vision-Language Understanding
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
AI Suite for upscaling, interpolating & restoring images/videos
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Official PyTorch Implementation of "Scalable Diffusion Models"
800,000 step-level correctness labels on LLM solutions to MATH problem
A latent text-to-image diffusion model
Multimodal Transformer for document image understanding and layout
Self-evolving AI model for agents, coding, and complex workflows
Reasoning-powered OCR VLM for converting complex documents to Markdown
Jan-v1-edge: efficient 1.7B reasoning model optimized for edge devices
Efficient 8B multimodal model tuned for advanced reasoning tasks.
Versatile 8B-base multimodal LLM, flexible foundation for custom AI