I agree we should group the threads into the main process. We have to define what and how threads should be presented to end users first so that we can decide what have to be stored. I don't have much knowledge on the user space oprofile code now. I will look into that and come back to discussion soon.
I am right now doing a lot of profiling job in my work by using slooooooow rational quantify and trying VTune now. For a multithreaded application (or may be even single thread program), it is also very useful to see why a process is blocked. I am thinking we can add some bookeeping code to the linux scheduler ( or someone is working on it or it was done already?). What do u guys think?
BTW, how is the P4 porting status. Is it already done? or some code is available? I just got a P4 machine and I wanna test it out :D