How many times have you deployed code that fell apart in production?
The answer is zero.
You may be surprised, but it is true for me.
The key to avoiding online problems is that we use real production workload for all kinds of testing such as smoking tesing, loading testing and performance testing.
Extracting user requests from the access logs may be fine for some applications, but it still may have drawbacks such as losing the IP and transport layer characteristics of the real workload.
Using the same or similar workload as the production workload is very useful for finding the potential problems before deploying code online. For example, when developing the NetEase (a famous IT company in China) ad delivery system in 2010, which is similar to Google’s double click,we adopted an effective method (replaying traffic from production to test in real-time using tcpcopy) and successfully avoided all online problems(program not crashed even under several ddos attackes).