Build Vision Agents quickly with any model or video provider
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Speech-AI-Forge is a project developed around TTS generation model
Supercharge Your LLM Application Evaluations
Database system for building simpler and faster AI-powered application
Lightweight framework for evaluating large language model performance
Scalable machine learning for time series forecasting
Bailing is a voice dialogue robot similar to GPT-4o
An Open Source text-to-speech system built by inverting Whisper
Generate blog articles from video or audio
Unified Multimodal Understanding and Generation Models
LLM powered fuzzing via OSS-Fuzz
Beyond the Imitation Game collaborative benchmark for measuring
DeepMind model for tracking arbitrary points across videos & robotics
Global weather forecasting model using graph neural networks and JAX
Expose your FastAPI endpoints as Model Context Protocol (MCP) tools
Tooling for the Common Objects In 3D dataset
code for Mesh R-CNN, ICCV 2019
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
PyTorch code and models for VJEPA2 self-supervised learning from video
Language modeling in a sentence representation space
Code for Language models can explain neurons in language models paper
Evals is a framework for evaluating LLMs and LLM systems
The ChatGPT Retrieval Plugin lets you easily find personal documents