Multimodal Diffusion with Representation Alignment
Multimodal-Driven Architecture for Customized Video Generation
From Images to High-Fidelity 3D Assets
ICLR2024 Spotlight: curation/training code, metadata, distribution
Hackable and optimized Transformers building blocks
Official implementation of DreamCraft3D
An Efficient Agentic Model for Computer Use
An experimental version of DeepSeek model
Open-Source Financial Large Language Models
Pokee Deep Research Model Open Source Repo
My personal Claude Code configuration
The official PyTorch implementation of Google's Gemma models
A Production-ready Reinforcement Learning AI Agent Library
Diffusion Transformer with Fine-Grained Chinese Understanding
code for Mesh R-CNN, ICCV 2019
FlashMLA: Efficient Multi-head Latent Attention Kernels
The ChatGPT Retrieval Plugin lets you easily find personal documents
Pushing the Limits of Mathematical Reasoning in Open Language Models
Reference implementation of the Transformer architecture optimized
Learning to Act by Watching Unlabeled Online Videos
Flagship MoE model for long-context agents and complex coding
Omnimodal AI model for agents, coding, and long-context tasks