Repo of Qwen2-Audio chat & pretrained large audio language model
Capable of understanding text, audio, vision, video
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A very simple framework for state-of-the-art NLP
Real-time voice interactive digital human
Qwen3-omni is a natively end-to-end, omni-modal LLM
Integrating LLMs into structured NLP pipelines
Stanford NLP Python library for many human languages
Toolkit for audio, music, and speech generation
Models for the spaCy Natural Language Processing (NLP) library
High-Resolution Image Synthesis with Latent Diffusion Models
The open-source data curation platform for LLMs
Jittor is a high-performance deep learning framework
text and image to video generation: CogVideoX (2024) and CogVideo
End-to-end speech processing toolkit
Framework for building neural networks
Bailing is a voice dialogue robot similar to GPT-4o
A deep learning toolkit for Text-to-Speech, battle-tested in research
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Aider is AI pair programming in your terminal
Mice speech to text with MX Cinnamon OS ISO
Seamlessly integrate LLMs into scikit-learn
Data loaders and abstractions for text and NLP
Refer and Ground Anything Anywhere at Any Granularity