ChatGPT interface with better UI
DeepSeek Coder: Let the Code Write Itself
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Long-form streaming TTS system for multi-speaker dialogue generation
Open-source industrial-grade ASR models
Foundation model for image generation
Hunyuan Translation Model Version 1.5
CLIP, Predict the most relevant text snippet given an image
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
PyTorch code and models for the DINOv2 self-supervised learning
Pushing the Limits of Mathematical Reasoning in Open Language Models
Tiny vision language model
The official PyTorch implementation of Google's Gemma models
General-purpose image editing model that delivers high-fidelity
Inference script for Oasis 500M
Diversity-driven optimization and large-model reasoning ability
Fast and Universal 3D reconstruction model for versatile tasks
Global weather forecasting model using graph neural networks and JAX
Renderer for the harmony response format to be used with gpt-oss
Fast-stable-diffusion + DreamBooth
Multimodal embedding and reranking models built on Qwen3-VL
Implementation of "MobileCLIP" CVPR 2024
Ling is a MoE LLM provided and open-sourced by InclusionAI
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning