Perf Book
The book "Performance Analysis and Tuning on Modern CPU"
... walks through vectorization, memory layout, data-oriented design, and algorithmic choices, illustrating when compiler flags, intrinsics, or hand-rolled assembly make sense. It also demonstrates tool-driven workflows—using profilers and PMU events—to locate true bottlenecks and validate that changes actually help. Throughout, the emphasis is on a methodical loop of hypothesize → measure → change → re-measure, rather than folklore or premature micro-optimizations.