Audiocraft is a library for audio processing and generation
airda(Air Data Agent
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model
End-to-end speech processing toolkit
Video understanding codebase from FAIR for reproducing video models
Multimodal Diffusion with Representation Alignment
Open source platform for the machine learning lifecycle
Reading book source
An MCP server that autonomously evaluates web applications
Helping you get the most out of AWS, wherever you use MCP
A collection of reference Jupyter notebooks and demo AI/ML application
Open-source tools for prompt testing and experimentation
InvokeAI is a leading creative engine for Stable Diffusion models
OCR expert VLM powered by Hunyuan's native multimodal architecture
Meta Agents Research Environments is a comprehensive platform
Inference code for scalable emulation of protein equilibrium ensembles
SwarmZero's SDK for building AI agents, swarms of agents and much more
An easy-to-use & supercharged open-source experiment tracker
Official MiniMax Model Context Protocol (MCP) server
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Code for Language models can explain neurons in language models paper
Training data (data labeling, annotation, workflow) for all data types
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Data science on data without acquiring a copy
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning