Pretrained time-series foundation model developed by Google Research
Bidirectional token-classification model for identifiable info
Foundation Models for Time Series
Netease Youdao's open-source embedding and reranker models
Audio foundation model excelling in audio understanding
Phi-3.5 for Mac: Locally-run Vision and Language Models
Programmatic access to the AlphaGenome model
A 0.1B Omni model trained from scratch
Open Source Speech Language Model
Qwen3-ASR is an open-source series of ASR models
Foundation model for image generation
Fast-stable-diffusion + DreamBooth
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Hunyuan Translation Model Version 1.5
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL
Collection of Gemma 3 variants that are trained for performance
Implementation of "MobileCLIP" CVPR 2024
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Stable Virtual Camera: Generative View Synthesis with Diffusion Models