VaultGemma is a sub-1B parameter variant of Google’s Gemma family that is pre-trained from scratch with Differential Privacy (DP), providing mathematically backed guarantees that its outputs do not reveal information about any single training example. Using DP-SGD with a privacy budget across a large English-language corpus (web documents, code, mathematics), it prioritizes privacy over raw utility. The model follows a Gemma-2–style architecture, outputs text from up to 1,024 input tokens, and is intended to be instruction-tuned for downstream language understanding and generation tasks. Training ran on TPU v6e using JAX and Pathways with privacy-preserving algorithms (DP-SGD, truncated Poisson subsampling) and DP scaling laws to balance compute and privacy budgets. Benchmarks on the 1B pre-trained checkpoint show expected utility trade-offs (e.g., HellaSwag 10-shot 39.09, BoolQ 0-shot 62.04, PIQA 0-shot 68.00), reflecting its privacy-first design.

Features

  • Pre-trained with Differential Privacy (DP-SGD) at ε≤2.0, δ≤1.1e-10
  • ~1.04B parameters; Gemma-2–style architecture focused on efficiency
  • 1,024-token input context; text-in/text-out for summarization, Q&A, chat
  • Trained on web, code, and math to improve reasoning and code understanding
  • No detectable data memorization in empirical tests (exact/approximate)
  • Trained on TPU v6e with JAX/Pathways and DP-oriented scaling/algorithms
  • Apache-style Gemma license access via Hugging Face (terms acceptance required)
  • Intended for privacy-critical fine-tuning (healthcare, finance) and DP research

Project Samples

Project Activity

See All Activity >

Categories

AI Models

Follow VaultGemma

VaultGemma Web Site

Other Useful Business Software
Gen AI apps are built with MongoDB Atlas Icon
Gen AI apps are built with MongoDB Atlas

Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of VaultGemma!

Additional Project Details

Registered

2025-09-17