Structure-from-Motion and Multi-View Stereo
Models for object and human mesh reconstruction
⚡ Building applications with LLMs through composability ⚡
ICLR2024 Spotlight: curation/training code, metadata, distribution
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Open-source large language model family from Tencent Hunyuan
Documentation for Google's Gen AI site - including Gemini API & Gemma
PPTAgent: Generating and Evaluating Presentations
Foundation Models for Time Series
tiktoken is a fast BPE tokeniser for use with OpenAI's models
High-resolution models for human tasks
Implementation of Make-A-Video, new SOTA text to video generator
The official PyTorch implementation of Google's Gemma models
Photorealistic Synthetic Dataset for Holistic Indoor Scene
Machine Learning Systems: Design and Implementation
Pushing the Limits of Mathematical Reasoning in Open Language Models
In-App assistant SDK to build a multimodal conversational UX websites
Advanced techniques for RAG systems
Implementation of Vision Transformer, a simple way to achieve SOTA
Set of tools to assess and improve LLM security
Open-source platform for building enterprise-grade agents
An alignment auditing agent capable of exploring alignment hypothesis
Discover pretrained models for deep learning in MATLAB
Audiocraft is a library for audio processing and generation
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model