Re: [Algorithms] General purpose task parallel threading approach
Brought to you by:
vexxed72
|
From: <asy...@gm...> - 2009-04-06 05:58:34
|
You are probably meaning some specific implementation of task-stealing approach? Task stealing can be implemented with and without global queue. Also, increasing number of cores trend results in improved inter-core communication mechanisms either (since otherwise it becomes a bottleneck), so I'm not sure that global queue approach (together with stealing or not) is that bad as you say. Alexander 2009/4/6 Sebastian Sylvan <seb...@gm...> > > > On Sun, Apr 5, 2009 at 9:40 PM, Jarkko Lempiainen <al...@gm...>wrote: > >> There isn’t “whole bunch of magic” outside the task you would need to do >> with the first option. The task just adds another function to the job queue >> which does the work after the wait. For second option it is job queues task >> to schedule the tasks for optimal throughput, so for wait it could for >> example favor tasks blocking the task from proceeding >> > > That's a whole bunch of magic to me. If you want pervasive parallelism (and > Mr Almdahl says we do), you're going to need sync points all over the place. > Currently we're not doing that because we don't *need* to do it yet, but if > you have 16-32 threads to keep busy, you bet you're going to need to spawn > of tasks all over the place (and usually sync immediately > afterwards). Having to manually split all of these things up into lots of > tiny tasks is a big issue IMO - it's tedious, error prone, and adds more > overhead (the cost of starting a task still isn't free - so doing it, say, > 5-10x as often is a big deal). > > With job stealing from other threads you don’t get global prioritization, >> so one task may be executing lower priority tasks and not to work towards >> optimal throughput. > > > Why do you actually need global prioritization? After all you need to > finish everything by the end of the frame anyway, so why would you care if > task A gets executed before task B, so long as you never get any bubbles > where a core is idle? > All you care about is maximal throughput, maximal locality (especially true > for in-order CPUs) and minimal contention/overhead, and for that a giant > shared queue is sub-optimal for two reasons: > * It's inherently non-scalable. If you have lots of threads reading from > the same queue you'll get contention. Even with relatively modest number of > threads this starts becoming an issue. Work stealing > has negligible coherency because you have to be unlucky enough to get > someone trying to steal from the bottom of your stack, at the same time as > you running out of tasks to do (so you're fighting for the very last item in > the stack). That's extremely unlikely to happen, and it's even more unlikely > for two threads to try to steal from the same stack - whereas a work queue > is constantly being hit by multiple threads. > * You loose locality. With a work-stealing approach you get good locality > because you run on the same thread as the task was spawned, and tasks > spawned close together in time get run close together in time, meaning good > cache coherency. If tasks are spread out over multiple cores, and over time, > you'll get less coherency. > > > -- > Sebastian Sylvan > +44(0)7857-300802 > UIN: 44640862 > > > ------------------------------------------------------------------------------ > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |