A Multi-Modal World Model for Reconstructing, Generating, Simulation
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Pretrained time-series foundation model developed by Google Research
DeepMind model for tracking arbitrary points across videos & robotics
OCR expert VLM powered by Hunyuan's native multimodal architecture
Open-source image generative foundation model
Convert Google Gemini web into OpenAI-compatible API
Open Source Speech Language Model
Open-source industrial-grade ASR models
Hunyuan Translation Model Version 1.5
Block Diffusion for Ultra-Fast Speculative Decoding
Implementation of "MobileCLIP" CVPR 2024
High-resolution models for human tasks
Ling is a MoE LLM provided and open-sourced by InclusionAI
Genome modeling and design across all domains of life
Generate Any 3D Scene in Seconds
Fast and Universal 3D reconstruction model for versatile tasks
Memory-efficient and performant finetuning of Mistral's models
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Unified Multimodal Understanding and Generation Models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
High-Fidelity and Controllable Generation of Textured 3D Assets
ChatGPT interface with better UI
High-Resolution Image Synthesis with Latent Diffusion Models