An Open Source text-to-speech system built by inverting Whisper
Towards Human-Sounding Speech
Open source personal AI Assistant for Linux, Windows and Mac
Open Source Document Management System for Digital Archives
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Implementation of AudioLM audio generation model in Pytorch
Synchronized Translation for Videos
Generate Any 3D Scene in Seconds
21 Lessons, Get Started Building with Generative AI
Simple, Pythonic building blocks to evaluate LLM applications
A Model Context Protocol (MCP) server
Open-Sora: Democratizing Efficient Video Production for All
TextWorld is a sandbox learning environment for the training
Generate audiobooks from e-books
A python library that makes AMR parsing, generation and visualization
ImageBind One Embedding Space to Bind Them All
Han Language Processing
A Repo For Document AI
User toolkit for analyzing and interfacing with Large Language Models
A sound cloning tool with a web interface, using your voice
text and image to video generation: CogVideoX (2024) and CogVideo
Interface for OuteTTS models
End-to-end speech processing toolkit