A simple, high-quality voice conversion tool focused on ease of use
Stable Diffusion web UI
High-Resolution Image Synthesis with Latent Diffusion Models
Code for the paper Language Models are Unsupervised Multitask Learners
Research code artifacts for Code World Model (CWM)
The largest collection of PyTorch image encoders / backbones
Generate short videos with one click using AI LLM
Release for Improved Denoising Diffusion Probabilistic Models
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Contexts Optical Compression
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
Reverse-engineered Python API for Google Gemini web app
The official gpt4free repository
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
TTS with kokoro and onnx runtime
VITS2 backbone with multilingual-bert
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Create videos with Stable Diffusion
Towards Human-Level Text-to-Speech through Style Diffusion
A reactive notebook for Python
A neural network that transforms a design mock-up into static websites
Lets make video diffusion practical
Ready-to-use OCR with 80+ supported languages
PyTorch code and models for VJEPA2 self-supervised learning from video
Audiocraft is a library for audio processing and generation