Re: [Algorithms] General purpose task parallel threading approach
Brought to you by:
vexxed72
|
From: Sebastian S. <seb...@gm...> - 2009-04-05 22:43:59
|
On Sun, Apr 5, 2009 at 9:40 PM, Jarkko Lempiainen <al...@gm...> wrote: > There isn’t “whole bunch of magic” outside the task you would need to do > with the first option. The task just adds another function to the job queue > which does the work after the wait. For second option it is job queues task > to schedule the tasks for optimal throughput, so for wait it could for > example favor tasks blocking the task from proceeding > That's a whole bunch of magic to me. If you want pervasive parallelism (and Mr Almdahl says we do), you're going to need sync points all over the place. Currently we're not doing that because we don't *need* to do it yet, but if you have 16-32 threads to keep busy, you bet you're going to need to spawn of tasks all over the place (and usually sync immediately afterwards). Having to manually split all of these things up into lots of tiny tasks is a big issue IMO - it's tedious, error prone, and adds more overhead (the cost of starting a task still isn't free - so doing it, say, 5-10x as often is a big deal). With job stealing from other threads you don’t get global prioritization, so > one task may be executing lower priority tasks and not to work towards > optimal throughput. Why do you actually need global prioritization? After all you need to finish everything by the end of the frame anyway, so why would you care if task A gets executed before task B, so long as you never get any bubbles where a core is idle? All you care about is maximal throughput, maximal locality (especially true for in-order CPUs) and minimal contention/overhead, and for that a giant shared queue is sub-optimal for two reasons: * It's inherently non-scalable. If you have lots of threads reading from the same queue you'll get contention. Even with relatively modest number of threads this starts becoming an issue. Work stealing has negligible coherency because you have to be unlucky enough to get someone trying to steal from the bottom of your stack, at the same time as you running out of tasks to do (so you're fighting for the very last item in the stack). That's extremely unlikely to happen, and it's even more unlikely for two threads to try to steal from the same stack - whereas a work queue is constantly being hit by multiple threads. * You loose locality. With a work-stealing approach you get good locality because you run on the same thread as the task was spawned, and tasks spawned close together in time get run close together in time, meaning good cache coherency. If tasks are spread out over multiple cores, and over time, you'll get less coherency. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |