CogView4, CogView3-Plus and CogView3(ECCV 2024)
Phi-3.5 for Mac: Locally-run Vision and Language Models
Open-source framework for intelligent speech interaction
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Designed for text embedding and ranking tasks
Generating Immersive, Explorable, and Interactive 3D Worlds
Diversity-driven optimization and large-model reasoning ability
Open-source large language model family from Tencent Hunyuan
Repo for SeedVR2 & SeedVR
Pokee Deep Research Model Open Source Repo
Implementation of the Surya Foundation Model for Heliophysics
Long-form streaming TTS system for multi-speaker dialogue generation
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A trainable PyTorch reproduction of AlphaFold 3
Multi-modal large language model designed for audio understanding
Large Multimodal Models for Video Understanding and Editing
The official PyTorch implementation of Google's Gemma models
Generate Any 3D Scene in Seconds
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
LLM-based Reinforcement Learning audio edit model
Multimodal embedding and reranking models built on Qwen3-VL
An Efficient Agentic Model for Computer Use
New family of code large language models (LLMs)