Build Vision Agents quickly with any model or video provider
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Speech-AI-Forge is a project developed around TTS generation model
Get started w/ building Fullstack Agents using Gemini 2.5 & LangGraph
A state-of-the-art open visual language model
Framework and no-code GUI for fine-tuning LLMs
A template for development with the open-autonomy framework
Document Image Parsing via Heterogeneous Anchor Prompting”
Open-weight, large-scale hybrid-attention reasoning model
Framework for building neural networks
StreamSpeech is a seamless model for offline speech recognition
Implementation of Vision Transformer, a simple way to achieve SOTA
The best ChatGPT that $100 can buy
This repository contains the official implementation of FastVLM
Supercharge Your LLM with the Fastest KV Cache Layer
PyTorch code and models for the DINOv2 self-supervised learning
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Official implementation of DreamCraft3D
A neural network that transforms a design mock-up into static websites
A solution to build and deploy MCP agents and applications
20+ high-performance LLMs with recipes to pretrain, finetune at scale
The fastest way to bring multi-agent workflows to production
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model
A code-first agent framework for seamlessly planning analytics tasks