Bailing is a voice dialogue robot similar to GPT-4o
Scalable generative AI framework built for researchers and developers
Interface for OuteTTS models
One-click deployment (including offline integration package)
A TTS model capable of generating ultra-realistic dialogue
This repository provides an advanced RAG
An MCP server that autonomously evaluates web applications
Agent framework and applications built upon Qwen>=3.0
Repo of Qwen2-Audio chat & pretrained large audio language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Tensor search for humans
The data structure for multimodal data
Implementation of Imagen, Google's Text-to-Image Neural Network
Fast image augmentation library and an easy-to-use wrapper
Build cross-modal and multimodal applications on the cloud
Python binding to the Apache Tika™ REST services
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Experimental, AI/ML-powered and open sourced Marketing Mix Modeling
LLM-based agent for general purpose software engineering tasks
High-Fidelity and Controllable Generation of Textured 3D Assets
Multi-modal large language model designed for audio understanding
Open-source framework for intelligent speech interaction
Large Multimodal Models for Video Understanding and Editing
A minimal yet professional single agent demo project