Re: [Algorithms] General purpose task parallel threading approach

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

You are probably meaning some specific implementation of task-stealing
approach? Task stealing can be implemented with and without global queue.
Also, increasing number of cores trend results in improved inter-core
communication mechanisms either (since otherwise it becomes a bottleneck),
so I'm not sure that global queue approach (together with stealing or not)
is that bad as you say.

Alexander

2009/4/6 Sebastian Sylvan <seb...@gm...>

>
>
> On Sun, Apr 5, 2009 at 9:40 PM, Jarkko Lempiainen <al...@gm...>wrote:
>
>>  There isn’t “whole bunch of magic” outside the task you would need to do
>> with the first option. The task just adds another function to the job queue
>> which does the work after the wait. For second option it is job queues task
>> to schedule the tasks for optimal throughput, so for wait it could for
>> example favor tasks blocking the task from proceeding
>>
>
> That's a whole bunch of magic to me. If you want pervasive parallelism (and
> Mr Almdahl says we do), you're going to need sync points all over the place.
> Currently we're not doing that because we don't *need* to do it yet, but if
> you have 16-32 threads to keep busy, you bet you're going to need to spawn
> of tasks all over the place (and usually sync immediately
> afterwards). Having to manually split all of these things up into lots of
> tiny tasks is a big issue IMO - it's tedious, error prone, and adds more
> overhead (the cost of starting a task still isn't free - so doing it, say,
> 5-10x as often is a big deal).
>
> With job stealing from other threads you don’t get global prioritization,
>> so one task may be executing lower priority tasks and not to work towards
>> optimal throughput.
>
>
> Why do you actually need global prioritization? After all you need to
> finish everything by the end of the frame anyway, so why would you care if
> task A gets executed before task B, so long as you never get any bubbles
> where a core is idle?
> All you care about is maximal throughput, maximal locality (especially true
> for in-order CPUs) and minimal contention/overhead, and for that a giant
> shared queue is sub-optimal for two reasons:
> * It's inherently non-scalable. If you have lots of threads reading from
> the same queue you'll get contention. Even with relatively modest number of
> threads this starts becoming an issue. Work stealing
> has negligible coherency because you have to be unlucky enough to get
> someone trying to steal from the bottom of your stack, at the same time as
> you running out of tasks to do (so you're fighting for the very last item in
> the stack). That's extremely unlikely to happen, and it's even more unlikely
> for two threads to try to steal from the same stack - whereas a work queue
> is constantly being hit by multiple threads.
> * You loose locality. With a work-stealing approach you get good locality
> because you run on the same thread as the task was spawned, and tasks
> spawned close together in time get run close together in time, meaning good
> cache coherency. If tasks are spread out over multiple cores, and over time,
> you'll get less coherency.
>
>
> --
> Sebastian Sylvan
> +44(0)7857-300802
> UIN: 44640862
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>

-- 
Regards,
Alexander Karnakov