Re: [Algorithms] General purpose task parallel threading approach
Brought to you by:
vexxed72
|
From: Jarkko L. <al...@gm...> - 2009-04-08 19:51:19
|
Pushing work to multiple queues doesn't guarantee that they are also executed simultaneously, unless if all your threads are idling, which isn't very happy scenario. To me it sounds that you are in the process of converting a single threaded engine to multi-threaded one where you start from 100%/number_of_threads efficiency, in which case what you say makes sense. I.e. you want the multi-threaded bits to be very local, where you dispatch work to multiple threads and sync right after, and any work you can parallelize is a win from the old design. However, if you have the advantage of designing an engine from scratch for multi-threading, I don't see this as a good approach. Cheers, Jarkko _____ From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Wednesday, April 08, 2009 1:13 AM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach On Tue, Apr 7, 2009 at 10:30 PM, Jarkko Lempiainen <al...@gm...> wrote: Okay, yes you would evict the data from cache if a higher priority task kicks in, assuming the data doesn't fit into the cache simultaneously before returning to work on the original task. Now, if you think of a 8 core system with job stealing, you must fit 8 times the amount of data into the cache because you have 8 threads working on total of 8 unrelated tasks simultaneously. This is much worse from data eviction point of view than switching to a higher priority task and returning to work on the original task after. Now think how this scales to 16, 32, 64 cores. You're assuming that work stealing scheduling will always work on unrelated tasks when this isn't the case. If you kick off a large batch of work it will spread out over all workers, it'll just let whatever's already there finish first - exploiting locality. > If I can take something that takes 3ms to finish and get it done in 0.5ms because I can use six threads instead of one then that's clearly a win. No, it's not a win. See, all you care is that ALL the work for a frame is completed as soon as possible, and not minimizing the time to execute a single task. If you had 50% overhead like you say, then you would have 50% less time to do any useful work. Now, you say it doesn't matter because you have threads sitting idle anyway, but don't you think that you have some serious work starvation issues in your engine instead which you try to patch with this approach with the conclusion that it's not a lost resource? If I have 50% overhead, that means that the other 50% is useful work that would've had to run sequentially otherwise! I'm doing 50% MORE useful work on my spare threads. The purpose of splitting stuff up into parallel tasks is to use idle cores to help speed up work. If you assume that you can keep those cores busy with useful work using Some Other Technique then you're basically assuming that there is no problem and that we therefore need no solution. That would be great if it was the case, but it seems unrealistic. The whole discussion is about "how do we use N threads to speed up a game engine", saying that we shouldn't spawn lots of fine grained tasks because the overhead in doing so would cost cycles that we could use for "useful work" isn't really helping me, unless you can tell me what other technique you plan to use to actually do this "useful work" that doesn't have the overhead of just ramping up task parallelism so it scales to the number of threads. Of course you wouldn't trade 100% useful work for 80% useful work and 20% overhead, that would be stupid. Nobody would write a task system with worker threads on for a single-threaded machine. What I'm saying is that in the not-so-far-off future when you start off with, say, 90% idle cycles and only 10% useful work, that 80/20 split is looking a lot more appealing. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |