Text and image to video generation: CogVideoX and CogVideo
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Implementation of Imagen, Google's Text-to-Image Neural Network
Claude Code skill implementing Manus-style persistent planning
A simple, high-quality voice conversion tool focused on ease of use
JupyterLab computational environment
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Easy-to-use and powerful NLP library with Awesome model zoo
CLIP, Predict the most relevant text snippet given an image
Transforming Multimodal Content into Captivating Multilingual Audio
Tools to ease the creation of snippets, syntax definitions, etc.
Statusline plugin for vim with prompts for several other applications
A simple native web interface that uses ChatTTS to synthesize text
Label Studio is a multi-type data labeling and annotation tool
A nearly-live implementation of OpenAI's Whisper
Framework for building realtime multimodal voice AI agents apps
A high-quality rapid TTS voice cloning model
Math OCR model that outputs LaTeX and markdown
Free, high-quality text-to-speech API endpoint to replace OpenAI
Full git and GitHub integration with Sublime Text
A Powerful Native Multimodal Model for Image Generation
Easy to use Python library for creating 2D arcade games
Industrial-level controllable zero-shot text-to-speech system
A general purpose syntax highlighter in pure Go
Deep Research framework, combining language models with tools