Multimodal-Driven Architecture for Customized Video Generation
Capable of understanding text, audio, vision, video
Get free HTTPS certificates forever from Let's Encrypt
A community-supported supercharged version of paperless
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
LLM abstractions that aren't obstructions
A deep learning toolkit for Text-to-Speech, battle-tested in research
Implementation of Make-A-Video, new SOTA text to video generator
A Model Context Protocol (MCP) server
TextWorld is a sandbox learning environment for the training
Turn words into chords
Chat & pretrained large audio language model proposed by Alibaba Cloud
Tools for manipulating datasets
Simple, Pythonic building blocks to evaluate LLM applications
An easy-to-use backup tool for GNU Linux using rsync in the back
Personal mini-web in text
SoTA open-source TTS
Open source personal AI Assistant for Linux, Windows and Mac
Code for running inference and finetuning with SAM 3 model
text and image to video generation: CogVideoX (2024) and CogVideo
Open-Sora: Democratizing Efficient Video Production for All
go1pylib is a Python library designed to control the Go1 robot
Qwen3-omni is a natively end-to-end, omni-modal LLM
A Unified Framework for Text-to-3D and Image-to-3D Generation
Textual is a TUI (Text User Interface) framework for Python