node-chaos-monkey brings chaos engineering to Node.js by injecting controlled failures into running services to validate resilience. It lets teams rehearse real-world incidents—latency spikes, random process exits, resource exhaustion—and observe how circuits, retries, and backoff strategies behave. The tool is designed to be safe and configurable, enabling narrow blast radii and scheduled experiments during non-critical windows. It integrates naturally with staging or even carefully guarded production environments where you want confidence instead of assumptions. Findings feed back into reliability work: hardening timeouts, rethinking concurrency limits, and improving fallbacks at code and infrastructure levels. By turning failure into a planned exercise, teams can surface weak spots before customers do.
Features
- Fault injection for latency, errors, and process disruptions in Node services
- Configurable blast radius, schedules, and experiment scopes
- Hooks for metrics and tracing to correlate chaos with system behavior
- Support for testing resiliency features like retries, timeouts, and circuit breakers
- Safe-by-default controls suited to staging and guarded production drills
- Actionable reports that inform reliability and capacity improvements