Re: [Algorithms] General purpose task parallel threading approach
Brought to you by:
vexxed72
|
From: Sebastian S. <seb...@gm...> - 2009-04-07 19:55:21
|
On Tue, Apr 7, 2009 at 7:35 PM, Jarkko Lempiainen <al...@gm...> wrote: > Yes, your prioritization would still function properly if you would > assign a thread per priority of course, but it wouldn’t result in optimal > throughput, what’s the point after all. You need all your worker threads to > work on the top priority tasks in order to have optimal throughput, which > requires that all the threads work on the single prioritized queue of jobs. > Job stealing doesn’t achieve this because each worker thread has its own > queue. I also never said that tasks should be interrupted prematurely, but > you should let them finish and start the next high priority task after. > I didn't say an individual task would be interrupted, but if you have 100 tasks working on something and you kick off 20 tasks to do your animation at a high priority when only 5 of those 100 tasks have completed, you'll evict a bunch of data from the cache and then you still have 95 tasks to go, then you do another 5 and something else pops up to the top evicting the data again, and so on. It's better for the entire batch of 100 tasks to run to completion. With work stealing large bathches of tasks that would benefit from sharing data across cores would naturally start to fill up the worker threads as the previous tasks finish - and if you really want to make sure that's the case, just call sync() before. > Having fine granularity tasks doesn’t help regardless how many threads you > have. Only thing you really care is that your jobs take let say <~10% of > your frame time (1-3ms), which isn’t very high granularity. Some might take > longer, but they are also prioritized higher to avoid stalling other threads > in the end of the frame. So if you can animate all your on-screen characters > in 1-3ms you shouldn’t bother multi-threading the code EXCEPT if you want to > gain shared cache coherence. However, with work stealing you wouldn’t even > gain this because each thread works on its own set of jobs. > I'm not sure I follow you? If I can take something that takes 3ms to finish and get it done in 0.5ms because I can use six threads instead of one then that's clearly a win. Even if I get an unreasonable amount of overhead and it takes 1ms (50% overhead!) it's still a 3x boost as far as the overall frame time. What you care about is that your set of tasks that you need to complete in a frame is done as quickly as possible, which means that it should waste as few cycles as possible, which means you should avoid contention, bubbles due to coarse granularity, and cache thrashing. Fine task granularity is clearly extremely important in order to actually be able to loads of threads. Of course you need to make sure the overhead of starting the thread isn't higher than the cost of just doing the work directly, but as the overhead is even slightly lower than the cost of running the task, it's a win to get some other thread that would otherwise be idle to do it "for free" while you only pay the fixed overhead cost. > > -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |