MARS5 speech model (TTS) from CAMB.AI
This repository provides an advanced RAG
Model Context Protocol server that integrates AgentQL's data
Open Source Differentiable Computer Vision Library
Images to inference with no labeling
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
LLM-based agent for general purpose software engineering tasks
Multi-modal large language model designed for audio understanding
Integrate cutting-edge LLM technology quickly and easily into your app
GUI Exploration Lab. One of the best GUI agent solutions
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Deploy and share agents with open infrastructure
An MCP server that autonomously evaluates web applications
The leading agent orchestration platform for Claude
Get started w/ building Fullstack Agents using Gemini 2.5 & LangGraph
Repo of Qwen2-Audio chat & pretrained large audio language model
The AI-powered coding wizard
No-code multi-agent framework to build LLM Agents, workflows
The Operator Splitting QP Solver
Refractoring ChatBot+LLM, Gpt-3.5-turbo, ChatGPT Bot/Voice Assistant
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
The data structure for multimodal data
Lightning fast C++/CUDA neural network framework