CLIP, Predict the most relevant text snippet given an image
ImageBind One Embedding Space to Bind Them All
PyTorch code and models for VJEPA2 self-supervised learning from video
Create videos with Stable Diffusion
Physical Symbolic Optimization
A Hyperparameter Tuning Library for Keras
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Topic Modelling for Humans
PyTorch code and models for V-JEPA self-supervised learning from video
Synchronized Translation for Videos
Generate Any 3D Scene in Seconds
An open-source toolkit for monitoring Language Learning Models (LLMs)
Implementation of Video Diffusion Models
Library of self-supervised methods for visual representation
Medical imaging toolkit for deep learning
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
InvokeAI is a leading creative engine for Stable Diffusion models
Large Multimodal Models for Video Understanding and Editing
Superfast AI decision making and processing of multi-modal data
A fast library for AutoML and tuning
PyTorch version of Stable Baselines
A toolkit to optimize ML models for deployment for Keras & TensorFlow
SOTA discrete acoustic codec models with 40/75 tokens per second
Implementation of the Surya Foundation Model for Heliophysics