Port of OpenAI's Whisper model in C/C++
CLIP, Predict the most relevant text snippet given an image
PyTorch code and models for VJEPA2 self-supervised learning from video
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Text-space optimizer that trains reusable natural-language skills
PyTorch code and models for V-JEPA self-supervised learning from video
A Family of Open Sourced Music Foundation Models
Recovering the Visual Space from Any Views
1B text generation model based on the HRM architecture
Generate high-definition story short videos with one click using AI
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Experimental Ant Design extensions for advanced UI patterns
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Implementation of the Surya Foundation Model for Heliophysics
Shared context board for teams and agents
Semi-Structured Agentic Framework. Workflows build themselves
Motion-controllable Video Generation via Latent Trajectory Guidance
A tool to use the Ai2 Open Coding Agents Soft-Verified Agents
Multimodal embedding and reranking models built on Qwen3-VL
AI-Driven Exploration in the Space of Code
Generate Any 3D Scene in Seconds
Diffusion Transformer with Fine-Grained Chinese Understanding
Audit, track usage, and compare your Claude Code skills
Context data platform for building observable, self-learning AI agents