I suggest a smaller profiling interval than 0.01 sec ... I tried changing the usleep(10000) in subp.c to usleep(100) and the elapsed time was almost the same (and was also almost the same as just running under time/1).
This was on Linux; I haven't had a chance to try this on any other system.