Audio foundation model excelling in audio understanding
Hackable and optimized Transformers building blocks
Capable of understanding text, audio, vision, video
1B text generation model based on the HRM architecture
The official PyTorch implementation of Google's Gemma models
Repo of Qwen2-Audio chat & pretrained large audio language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Unified Multimodal Understanding and Generation Models
Global weather forecasting model using graph neural networks and JAX
State-of-the-art (SoTA) text-to-video pre-trained model
Open-source framework for intelligent speech interaction
Large Multimodal Models for Video Understanding and Editing
OCR expert VLM powered by Hunyuan's native multimodal architecture
RGBD video generation model conditioned on camera input
Convert Google Gemini web into OpenAI-compatible API
A 0.1B Omni model trained from scratch
26m function call model that runs on incredibly small devices
Qwen3-ASR is an open-source series of ASR models
A Pragmatic VLA Foundation Model
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Hunyuan Translation Model Version 1.5
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL
Z80-μLM is a 2-bit quantized language model
Collection of Gemma 3 variants that are trained for performance