State of the art LLM and coding model
RGBD video generation model conditioned on camera input
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Fast and Universal 3D reconstruction model for versatile tasks
Foundational Models for State-of-the-Art Speech and Text Translation
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Global weather forecasting model using graph neural networks and JAX
Tooling for the Common Objects In 3D dataset
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multi-modal large language model designed for audio understanding
Chat & pretrained large audio language model proposed by Alibaba Cloud
Pushing the Limits of Mathematical Reasoning in Open Language Models
Chat & pretrained large vision language model
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Example Discord bot written in Python that uses the completions API
Official code for Style Aligned Image Generation via Shared Attention
Code for the paper Hybrid Spectrogram and Waveform Source Separation
800,000 step-level correctness labels on LLM solutions to MATH problem
llama.go is like llama.cpp in pure Golang
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion
A minimal PyTorch re-implementation of the OpenAI GPT
Code release for "Masked-attention Mask Transformer
Facebook AI Research Sequence-to-Sequence Toolkit