I'm seeing that the normalization process is not done in multi-threading. My dual core processor (as many people nowadays) is used at half. Not mentioning newer processors that are used at 25%.
The solution is to fork a normalization thread for processor. The only problem is to split the work queue.