Gizzard is a Scala framework originally developed by Twitter for building scalable, fault-tolerant, distributed key-value stores that can be sharded and replicated. It provides infrastructure for routing requests through shard trees, splitting or rebalancing shards dynamically, failover, and migrations. In Gizzard, data is stored in underlying storage shards (which could be databases or other stores) and Gizzard handles the process of routing requests correctly as the cluster topology changes. Gizzard's architecture is designed for operational flexibility: you can change the shard layout over time, reassign replicas, migrate data between nodes, and have requests redirected during transitions. It also supports secondary indexing and provides hooks for custom logic in migrations and consistency. Because Gizzard handles much of the complexity of shard routing and cluster transitions, it was used to support large-scale, evolving storage backends in production.
Features
- Flexible sharding / partitioning of data: forwarding tables mapping key ranges to storage shards to distribute load.
- Replication trees: ability to replicate data across multiple backend shards for fault tolerance and availability.
- Support for backend pluggability: various storage backends can be used (SQL databases, Lucene, Redis, etc.)
- Graceful handling of shard migrations (adding machines, rebalancing shards) with minimal disruption.
- Requires write operations to be idempotent and commutative to tolerate failures, out-of-order writes, retries.
- Stateless frontends: Gizzard instances (middleware nodes) are stateless so scaling them is easier; most state resides in shards and configuration.