Multi-modal large language model designed for audio understanding
Large Multimodal Models for Video Understanding and Editing
The official PyTorch implementation of Google's Gemma models
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Instructions on how to use the Realtime API on Microcontrollers
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Visual Causal Flow
Open-weight, large-scale hybrid-attention reasoning model
Inference script for Oasis 500M
This repository contains the official implementation of FastVLM
Memory-efficient and performant finetuning of Mistral's models
Pushing the Limits of Mathematical Reasoning in Open Language Models
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Research code artifacts for Code World Model (CWM)
Hunyuan Translation Model Version 1.5
VMZ: Model Zoo for Video Modeling
High-resolution models for human tasks
CLIP, Predict the most relevant text snippet given an image
Repo of Qwen2-Audio chat & pretrained large audio language model
Continuous Autonomy for the AI SDK
Fast and Universal 3D reconstruction model for versatile tasks
4M: Massively Multimodal Masked Modeling
FAIR Sequence Modeling Toolkit 2
A Production-ready Reinforcement Learning AI Agent Library
A PyTorch library for implementing flow matching algorithms