Sorry for butting in (I am not affiliated with judy).
1) Compilers that can do IPO (interprocedural optimization)
such as the intel icc obviate the need for a header only
implementation. Implementing everything in headers tends to
create mess (look at linux kernel headers for an example).
2) are you sure the random number sequences are comparable?
If not, then you may be comparing apples to oranges.