Re: [Algorithms] General purpose task parallel threading approach
Brought to you by:
vexxed72
|
From: Jarkko L. <al...@gm...> - 2009-04-07 21:30:18
|
Okay, yes you would evict the data from cache if a higher priority task kicks in, assuming the data doesn't fit into the cache simultaneously before returning to work on the original task. Now, if you think of a 8 core system with job stealing, you must fit 8 times the amount of data into the cache because you have 8 threads working on total of 8 unrelated tasks simultaneously. This is much worse from data eviction point of view than switching to a higher priority task and returning to work on the original task after. Now think how this scales to 16, 32, 64 cores. > If I can take something that takes 3ms to finish and get it done in 0.5ms because I can use six threads instead of one then that's clearly a win. No, it's not a win. See, all you care is that ALL the work for a frame is completed as soon as possible, and not minimizing the time to execute a single task. If you had 50% overhead like you say, then you would have 50% less time to do any useful work. Now, you say it doesn't matter because you have threads sitting idle anyway, but don't you think that you have some serious work starvation issues in your engine instead which you try to patch with this approach with the conclusion that it's not a lost resource? Cheers, Jarkko _____ From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Tuesday, April 07, 2009 10:55 PM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach On Tue, Apr 7, 2009 at 7:35 PM, Jarkko Lempiainen <al...@gm...> wrote: Yes, your prioritization would still function properly if you would assign a thread per priority of course, but it wouldn't result in optimal throughput, what's the point after all. You need all your worker threads to work on the top priority tasks in order to have optimal throughput, which requires that all the threads work on the single prioritized queue of jobs. Job stealing doesn't achieve this because each worker thread has its own queue. I also never said that tasks should be interrupted prematurely, but you should let them finish and start the next high priority task after. I didn't say an individual task would be interrupted, but if you have 100 tasks working on something and you kick off 20 tasks to do your animation at a high priority when only 5 of those 100 tasks have completed, you'll evict a bunch of data from the cache and then you still have 95 tasks to go, then you do another 5 and something else pops up to the top evicting the data again, and so on. It's better for the entire batch of 100 tasks to run to completion. With work stealing large bathches of tasks that would benefit from sharing data across cores would naturally start to fill up the worker threads as the previous tasks finish - and if you really want to make sure that's the case, just call sync() before. Having fine granularity tasks doesn't help regardless how many threads you have. Only thing you really care is that your jobs take let say <~10% of your frame time (1-3ms), which isn't very high granularity. Some might take longer, but they are also prioritized higher to avoid stalling other threads in the end of the frame. So if you can animate all your on-screen characters in 1-3ms you shouldn't bother multi-threading the code EXCEPT if you want to gain shared cache coherence. However, with work stealing you wouldn't even gain this because each thread works on its own set of jobs. I'm not sure I follow you? If I can take something that takes 3ms to finish and get it done in 0.5ms because I can use six threads instead of one then that's clearly a win. Even if I get an unreasonable amount of overhead and it takes 1ms (50% overhead!) it's still a 3x boost as far as the overall frame time. What you care about is that your set of tasks that you need to complete in a frame is done as quickly as possible, which means that it should waste as few cycles as possible, which means you should avoid contention, bubbles due to coarse granularity, and cache thrashing. Fine task granularity is clearly extremely important in order to actually be able to loads of threads. Of course you need to make sure the overhead of starting the thread isn't higher than the cost of just doing the work directly, but as the overhead is even slightly lower than the cost of running the task, it's a win to get some other thread that would otherwise be idle to do it "for free" while you only pay the fixed overhead cost. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |