Check out this optimization document. It is designed toward Pentium processors, but it has usefull information for all CPUs.
Organizing data for best cache usage, avoiding dependancy chains, associativity in caches and what it means VS memory addresses....
http://www.agner.org/assem/pentopt.pdf