GLM-4 series: Open Multilingual Multimodal Chat LMs
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Python SDK for Claude Agent
Phi-3.5 for Mac: Locally-run Vision and Language Models
From Images to High-Fidelity 3D Assets
General-purpose image editing model that delivers high-fidelity
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Netease Youdao's open-source embedding and reranker models
MOSS‑TTS Family open‑source speech and sound generation model
Open-source deep-learning framework
Sharp Monocular Metric Depth in Less Than a Second
Provides convenient access to the Anthropic REST API from any Python 3
Designed for text embedding and ranking tasks
Generating Immersive, Explorable, and Interactive 3D Worlds
A 0.1B Omni model trained from scratch
State-of-the-art (SoTA) text-to-video pre-trained model
Unified Multimodal Understanding and Generation Models
Robust Speech Recognition Across Languages, Dialects
A Powerful Native Multimodal Model for Image Generation
Video Object and Interaction Deletion
A Systematic Framework for Interactive World Modeling
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Open-source framework for intelligent speech interaction
Repo for SeedVR2 & SeedVR
Implementation of the Surya Foundation Model for Heliophysics