A theoretical reconstruction of the Claude Mythos architecture
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Hackable and optimized Transformers building blocks
Generating Immersive, Explorable, and Interactive 3D Worlds
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Z80-μLM is a 2-bit quantized language model
tiktoken is a fast BPE tokeniser for use with OpenAI's models
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
From Images to High-Fidelity 3D Assets
ICLR2024 Spotlight: curation/training code, metadata, distribution
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
RGBD video generation model conditioned on camera input
Open-weight, large-scale hybrid-attention reasoning model
High-Fidelity and Controllable Generation of Textured 3D Assets
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Fine-tuning ChatGLM-6B with PEFT
A method to increase the speed and lower the memory footprint
A minimal PyTorch re-implementation of the OpenAI GPT
Reference implementation of the Transformer architecture optimized
Reproduces results of "Fixing the train-test resolution discrepancy"
Generate embeddings from large-scale graph-structured data
LL model providing reasoning and conversational capabilities
Open language model developed by NVIDIA as part of Nemotron-3 family
Robust BERT-based model for English with improved MLM training
Multimodal Transformer for document image understanding and layout