Powerful AI language model (MoE) optimized for efficiency/performance
The React for Voice and Chat, build apps for Alexa, Google Assistant
Robust Speech Recognition via Large-Scale Weak Supervision
Code for running inference with the SAM 3D Body Model 3DB
Image generation model with single-stream diffusion transformer
A robust, efficient, low-latency speech-to-text library
A HTML5 video player with a parser that saves traffic
Document Image Parsing via Heterogeneous Anchor Prompting”
Models for object and human mesh reconstruction
Node.js client for the official ChatGPT API. 🔥
Context data platform for building observable, self-learning AI agents
This SDK is now deprecated, use the new unified Google GenAI SDK
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Photorealistic Synthetic Dataset for Holistic Indoor Scene
PyTorch code and models for V-JEPA self-supervised learning from video
Platform-only control plane for autonomous AI Agents
Lightning-fast, on-device TTS, running natively via ONNX
Build GenAI application quick and easy
[CVPR 2025 Best Paper Award] VGGT
End-to-end speech processing toolkit
LLM powered fuzzing via OSS-Fuzz
A Unified Framework for Image Customization
Optimized Workforce Learning for General Multi-Agent Assistance
A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator
Code release for "Masked-attention Mask Transformer