Tool for visualizing and tracking your machine learning experiments
HunyuanVideo: A Systematic Framework For Large Video Generation Model
An elegent pytorch implement of transformers
A simple screen parsing tool towards pure vision based GUI agent
DSPy: The framework for programming—not prompting—language models
From Images to High-Fidelity 3D Assets
The largest collection of PyTorch image encoders / backbones
The repository provides code for running inference with SAM 2
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
1 min voice data can also be used to train a good TTS model
AirLLM 70B inference with single 4GB GPU
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Global weather forecasting model using graph neural networks and JAX
BitNet: Scaling 1-bit Transformers for Large Language Models
Inference script for Oasis 500M
Research code artifacts for Code World Model (CWM)
A TTS model capable of generating ultra-realistic dialogue
Ready-to-use OCR with 80+ supported languages
OCR expert VLM powered by Hunyuan's native multimodal architecture
The official Meta Llama 3 GitHub site
Towards Real-World Vision-Language Understanding
Multi-class confusion matrix library in Python
Train multi-step agents for real-world tasks using GRPO
Official inference repo for FLUX.2 models
Gemma open-weight LLM library, from Google DeepMind