CLIP, Predict the most relevant text snippet given an image
CLIP Tool Kit (CTK)
Embed images and sentences into fixed-length vectors
An open source implementation of CLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution
Automatically translates the text of a video based on a subtitle file
Instant voice cloning by MIT and MyShell. Audio foundation model
Chat client for Twitch
TorchMultimodal is a PyTorch library
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
A Wayland compositor inspired by Window Maker
The most powerful and modular diffusion model GUI, api and backend
Stable Diffusion web UI
LTX-Video Support for ComfyUI
Tensor search for humans
Generating Immersive, Explorable, and Interactive 3D Worlds
Implementation of Imagen, Google's Text-to-Image Neural Network
Animation Engine for Dear ImGui
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Workflow and speech recognition app
Interface for OuteTTS models
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
A Python library for audio data augmentation
Large Multimodal Models for Video Understanding and Editing
MARS5 speech model (TTS) from CAMB.AI