MARS5 speech model (TTS) from CAMB.AI
A TTS model capable of generating ultra-realistic dialogue
Plug-and-play library to enable agents to call MCP and UTCP tools
Diversity-driven optimization and large-model reasoning ability
This repository provides an advanced RAG
An MCP server that autonomously evaluates web applications
Get started w/ building Fullstack Agents using Gemini 2.5 & LangGraph
Chinese and English multimodal conversational language model
Agent framework and applications built upon Qwen>=3.0
Repo of Qwen2-Audio chat & pretrained large audio language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A solution to build and deploy MCP agents and applications
Lightweight Python library for adding real-time multi-object tracking
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Tensor search for humans
The data structure for multimodal data
Open source framework for deep learning satellite and aerial imagery
A fast library for AutoML and tuning
Jittor is a high-performance deep learning framework
Fast image augmentation library and an easy-to-use wrapper
Build cross-modal and multimodal applications on the cloud
A multi-function Discord bot
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
LLM-based agent for general purpose software engineering tasks