DeepMind model for tracking arbitrary points across videos & robotics
Diversity-driven optimization and large-model reasoning ability
OCR expert VLM powered by Hunyuan's native multimodal architecture
RGBD video generation model conditioned on camera input
Miso TTS is an 8 billion, highly emotive text-to-speech model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Inference code for scalable emulation of protein equilibrium ensembles
The Clay Foundation Model - An open source AI model and interface
AI PPT Track Terminator, the strongest PPT Skill ever
Netease Youdao's open-source embedding and reranker models
Audio foundation model excelling in audio understanding
1B text generation model based on the HRM architecture
Tiny vision language model
The official PyTorch implementation of Google's Gemma models
Open-source image generative foundation model
Convert Google Gemini web into OpenAI-compatible API
A 0.1B Omni model trained from scratch
26m function call model that runs on incredibly small devices
Open Source Speech Language Model
Open-source industrial-grade ASR models
Qwen3-ASR is an open-source series of ASR models
A Pragmatic VLA Foundation Model
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Hunyuan Translation Model Version 1.5
Block Diffusion for Ultra-Fast Speculative Decoding