Speakr is a personal, self-hosted web application
Spark-TTS Inference Code
Framework for building real-time voice and multimodal AI agents
21 Lessons, Get Started Building with Generative AI
Qwen2.5-VL is the multimodal large language model series
A full spaCy pipeline and models for scientific/biomedical documents
Math OCR model that outputs LaTeX and markdown
Extract audio and video content and organize it into a Markdown note
RAG-Anything: All-in-One RAG Framework
Multimodal embedding and reranking models built on Qwen3-VL
Designed for text embedding and ranking tasks
NLP Cloud serves high performance pre-trained or custom models for NER
A community sourced database of game controller mappings
Foundational model for human-like, expressive TTS
Sample code and notebooks for Generative AI on Google Cloud
Controllable & emotion-expressive zero-shot TTS
Controllable and fast Text-to-Speech for over 7000 languages
Accurate × Fast × Comprehensive
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Agent harness to make your slop code well-engineered and beautiful
A theme for Sublime Text 3 by Mattia Astorino
A Multi-Modal World Model for Reconstructing, Generating, Simulation
The most powerful local music generation model
Code and models for ICML 2024 paper, NExT-GPT
Underthesea - Vietnamese NLP Toolkit