SOTA discrete acoustic codec models with 40/75 tokens per second
Controllable and fast Text-to-Speech for over 7000 languages
DeepMind model for tracking arbitrary points across videos & robotics
code for Mesh R-CNN, ICCV 2019
kaldi-asr/kaldi is the official location of the Kaldi project
Best practices on recommendation systems
Qwen3-omni is a natively end-to-end, omni-modal LLM
Utilities intended for use with Llama models
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Towards Human-Sounding Speech
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
FAIR Sequence Modeling Toolkit 2
Uplift modeling and causal inference with machine learning algorithms
CodeGeeX2: A More Powerful Multilingual Code Generation Model
Lightweight Python library for adding real-time multi-object tracking
Capable of understanding text, audio, vision, video
PyTorch library of curated Transformer models and their components
ktrain is a Python library that makes deep learning AI more accessible
Large Multimodal Models for Video Understanding and Editing
A state-of-the-art open visual language model
LLM-based Reinforcement Learning audio edit model
Virtual AI anchor that combines state-of-the-art technology
Hyperparameter Optimization for TensorFlow, Keras and PyTorch
Open Source Computer Vision Library
Build high-performance AI models with modular building blocks