TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
CLIP, Predict the most relevant text snippet given an image
PyTorch code and models for VJEPA2 self-supervised learning from video
Training Large Language Model to Reason in a Continuous Latent Space
Create videos with Stable Diffusion
Topic Modelling for Humans
Physical Symbolic Optimization
State-of-the-art (SoTA) text-to-video pre-trained model
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
A Hyperparameter Tuning Library for Keras
A Family of Open Sourced Music Foundation Models
PyTorch code and models for V-JEPA self-supervised learning from video
Recovering the Visual Space from Any Views
Synchronized Translation for Videos
Implementation of Video Diffusion Models
An open-source toolkit for monitoring Language Learning Models (LLMs)
AI-Driven Exploration in the Space of Code
Medical imaging toolkit for deep learning
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Multi-modal large language model designed for audio understanding
PyTorch version of Stable Baselines
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Generate Any 3D Scene in Seconds
InvokeAI is a leading creative engine for Stable Diffusion models
A fast library for AutoML and tuning