VaultGemma is a sub-1B parameter variant of Google’s Gemma family that is pre-trained from scratch with Differential Privacy (DP), providing mathematically backed guarantees that its outputs do not reveal information about any single training example. Using DP-SGD with a privacy budget across a large English-language corpus (web documents, code, mathematics), it prioritizes privacy over raw utility. The model follows a Gemma-2–style architecture, outputs text from up to 1,024 input tokens, and is intended to be instruction-tuned for downstream language understanding and generation tasks. Training ran on TPU v6e using JAX and Pathways with privacy-preserving algorithms (DP-SGD, truncated Poisson subsampling) and DP scaling laws to balance compute and privacy budgets. Benchmarks on the 1B pre-trained checkpoint show expected utility trade-offs (e.g., HellaSwag 10-shot 39.09, BoolQ 0-shot 62.04, PIQA 0-shot 68.00), reflecting its privacy-first design.
Features
- Pre-trained with Differential Privacy (DP-SGD) at ε≤2.0, δ≤1.1e-10
- ~1.04B parameters; Gemma-2–style architecture focused on efficiency
- 1,024-token input context; text-in/text-out for summarization, Q&A, chat
- Trained on web, code, and math to improve reasoning and code understanding
- No detectable data memorization in empirical tests (exact/approximate)
- Trained on TPU v6e with JAX/Pathways and DP-oriented scaling/algorithms
- Apache-style Gemma license access via Hugging Face (terms acceptance required)
- Intended for privacy-critical fine-tuning (healthcare, finance) and DP research