Re: [Algorithms] General purpose task parallel threading approach
Brought to you by:
vexxed72
|
From: Jon W. <jw...@gm...> - 2009-04-04 16:12:52
|
asy...@gm... wrote: > My target goal for a task switch cost is about 1-2 hundred of cycles, > which is about 10-15 millions of task switches per second on a 3ghz cpu. > You should also compare thread spawning cost (unless you preallocate > all of them) and a memory overhead (for windows threads it is minimum > 4k for stack size), but if you preallocate, you'd do it for the > maximum stack size. Creation cost in my system is just a malloc (which > you can also override with a faster allocator), and minimum allowed > task is 32 bytes (on x86). > If you're using exception handling, or stack buffer overflow, or even certain standard C library functions, a 32-byte stack will get you into trouble. Also, you want your thread/task stacks to be page aligned, so that a thread overflow will actually be caught by a guard page. And, yes, you'd either pre-allocate your maximum number of blocking threads, or you would allocate more threads on demand when you started blocking and ran out of runnable threads. Note: in a system with guard page security for each thread stack, you need 8 kB/thread minimum, plus the register state (which is about 512 bytes if you care about SSE). This is no matter whether you're using fibers or threads. Trying to find out whether an innocuous library call you're making is going to overwrite your malloc-ed stack or not is too unpredictable to rely on IMO -- I prefer the safety of a guard page. I don't think a full thread context (stack pointer, registers, pc) can be done in a hundred cycles on x86; at least not if they aren't already all in L1 on the local core. The amount of data you need to shuffle and the memory latencies involved seem to contra-indicate such a low switch time. Getting as close as you can is not a bad idea, though :-) Sincerely, jw |