End-to-end speech processing toolkit
AI discovers 520000 stable inorganic crystal structures for research
code for Mesh R-CNN, ICCV 2019
PyTorch code and models for VJEPA2 self-supervised learning from video
An AI-powered security review GitHub Action using Claude
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Minimal Deep Q Learning (DQN & DDQN) implementations in Keras
Taming Stable Diffusion for Lip Sync
Build Vision Agents quickly with any model or video provider
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
A state-of-the-art open visual language model
Chinese and English multimodal conversational language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Helping you get the most out of AWS, wherever you use MCP
MII makes low-latency and high-throughput inference possible
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Toolkit for conversational AI
A library for deep learning end-to-end dialog systems and chatbots
A Python toolbox for scalable outlier detection
Stanford NLP Python library for many human languages
High-Fidelity and Controllable Generation of Textured 3D Assets
State-of-the-art (SoTA) text-to-video pre-trained model
OCR expert VLM powered by Hunyuan's native multimodal architecture
Benchmarking synthetic data generation methods