Port of OpenAI's Whisper model in C/C++
CLIP, Predict the most relevant text snippet given an image
PyTorch code and models for VJEPA2 self-supervised learning from video
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
State-of-the-art (SoTA) text-to-video pre-trained model
Training Large Language Model to Reason in a Continuous Latent Space
Create videos with Stable Diffusion
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Text-space optimizer that trains reusable natural-language skills
PyTorch code and models for V-JEPA self-supervised learning from video
A Family of Open Sourced Music Foundation Models
Recovering the Visual Space from Any Views
UI-TARS-desktop version that can operate on your local personal device
Multi-modal large language model designed for audio understanding
Integrate cutting-edge LLM technology quickly and easily into your app
A Hyperparameter Tuning Library for Keras
A Rust machine learning framework
Physical Symbolic Optimization
Topic Modelling for Humans
An open-source toolkit for monitoring Language Learning Models (LLMs)
1B text generation model based on the HRM architecture
Generate high-definition story short videos with one click using AI
InvokeAI is a leading creative engine for Stable Diffusion models
ESP32 Camera motion capture application to record JPEGs to SD card
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent