Re: [Algorithms] General purpose task parallel threading approach
Brought to you by:
vexxed72
|
From: Jon W. <jw...@gm...> - 2009-04-04 23:10:39
|
Nicholas "Indy" Ray wrote: > I can't speak on ultraspec, but neither larabee or GPUs use > hyperthreading as the main form of threading (or afaik at all) They > are both data parallel architectures primarily. Additionally, > The latest UltraSparc has, I think, eight cores, each with eight hyperthreads. When one core blocks on a cache miss, it will automatically schedule to the next hyperthread. Larrabee has almost the exact same set-up; many hyper-threads that it switches between to hide data access latencies. NVIDIA calls the same thing warps, I think (one warp is 8 threads -- get it? :-) However, in software, you don't get the same fine-grained benefit; not by a long shot. When you wait for something, you either wait on a different task using a user-mode sychronization primitive, or you wait on something that has to come from the kernel (I/O, interrupt, etc). Those are the only two options. Neither of them allow you to switch tasks quickly enough or with little enough overhead to be compared to hyper-threading IMO. > Additionally hyperthreading still requires threads to be designed in a > way that is low on data contention. And if you already have worker > tasks that can do such, It's generally best to create OS threads and > just let them run on separate threads or hyperthreaded. But still > ensure that the main thread does no waiting. In a well designed > system, thread switches are a non-issue as they just shouldn't happen > very often (or at all in the case of consoles) So I don't think it's > worthwhile worrying too much about the amount of time it takes to do a > thread switch. > I think the idea of a thread per subsystem (particles, skinning, collision, simulation, audio, scene graph issue, etc) will only scale so far. Once you have 64-way CPUs (like the latest Sparcs) and even more with Larrabee (if you include the hyper-threads), making your game (or any software) go fast will mean a task-oriented workload, rather than a subsystem-oriented workload. As long as your tasks are significantly heavier than context switching, you're doing fine. That's what the fibers-within-threads tries to optimize. There probably exists a real-world workload where the efficiency of fibers leads to measurable throughput, even though the overhead of fibers or threading is not dramatically large compared to the workload, but I think that slice is pretty thin. It all comes down to Amdahl's Law in the end. Sincerely, jw |