Designed for text embedding and ranking tasks
gpt-oss-120b and gpt-oss-20b are two open-weight language models
PyTorch code and models for the DINOv2 self-supervised learning
Easy Docker setup for Stable Diffusion with user-friendly UI
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Controllable & emotion-expressive zero-shot TTS
A Family of Open Sourced Music Foundation Models
GLM-4 series: Open Multilingual Multimodal Chat LMs
Implementation of the Surya Foundation Model for Heliophysics
CLIP, Predict the most relevant text snippet given an image
Repo of Qwen2-Audio chat & pretrained large audio language model
A Systematic Framework for Interactive World Modeling
Industrial-level controllable zero-shot text-to-speech system
Towards Real-World Vision-Language Understanding
code for Mesh R-CNN, ICCV 2019
Large Multimodal Models for Video Understanding and Editing
Video Object and Interaction Deletion
Generating Immersive, Explorable, and Interactive 3D Worlds
Inference script for Oasis 500M
Open-source multi-speaker long-form text-to-speech model
4M: Massively Multimodal Masked Modeling
A SOTA open-source image editing model
LLM-based Reinforcement Learning audio edit model
Open-source industrial-grade ASR models
The official PyTorch implementation of Google's Gemma models