Thread: Re: [Algorithms] General purpose task parallel threading approach (Page 5)
Brought to you by:
vexxed72
|
From: Sebastian S. <seb...@gm...> - 2009-04-11 11:55:28
|
On Sat, Apr 11, 2009 at 10:20 AM, Jarkko Lempiainen <al...@gm...>wrote: > I was talking about generic task system before. > Later on you're saying that it's not a generic task system. By "generic" I mean "can handle all current kinds of control flow", which yours can't, as far as I understand you. > Dependencies impact to the task priorities > Yes I see what you mean now, but it's very confusing to use the word "priority" here. I was assuming you were talking about a general task system (which *requires* an on-line, as opposed to an ahead-of-time, scheduler - meaning that the word "priority" could only mean a dynamic priority that shuffles things around while tasks are already running). , because you want the longest chain of work (critical path) to start as > early as possible in the frame. > I think it's worth mentioning yet again that work stealing, an on-line scheduler which doesn't have your limitations, still only depends on the critical path with a constant factor (which, empirically, turns out to be very close to 1 in practice). In contrast, your system *doesn't* have a running time bound that's depends on the critical path with a constant factor as far as I can tell (this is back-of-a-napkin, feel free to do a rigorous analysis)! So I'd be careful claiming that your system is inherently more efficient if asymptotically it's actually worse. It may be better in practice (the constant involved in works stealing may be large), but you'd definitely need empirical data to show this... > in games we are in fortunate position that pretty much the same call graph > is ran from one frame to the next, so it’s possible to exploit > frame-to-frame coherency to prioritize tasks and have pretty good idea > what’s going to happen ahead of time. It’s not perfect of course (i.e. when > a task is run for the first time there is no priority for it), but it’s > close and definitely better than not prioritizing at all. > Where's your evidence for this? If you compare your *restricted* system running with and without priorities then I'm not surprised, but you have to compare it to a system that isn't as restricted, and actually won't consider something a dependency until it knows for a fact that it is, since this has the potential to drastically affect throughput. The way I see it *loads* of functions start off by checking some internal value or parameter and basically touching different data based on the result. And these functions call *other* functions which do the same thing. If your task system has to pessimistically consider *any* data it could possibly touch a dependency, rather than only the *actual* data it touches, then this would cause the dependency graph to explode exponentially, which could severely restrict your throughput. And like I pointed out before, this still assumes that your code is of the very restricted form that even allows pessimistic dependency tracking, some very simple and common control flow would simply be mathematically impossible to schedule ahead of time (I gave a few examples earlier). > In my implementation I don’t define dependencies beforehand like you seem > to assume you would have to do, but I define them at the call site, so there > isn’t external maintenance to do but the usage is pretty much like calling a > function (1-2 lines of code), so it’s quite convenient to use. One > motivation for the job queue I implemented was that it should be simple to > use so that you could easier take advantage of opportunities for > parallelism. > How does that actually work? If I spawn a task in the function foo, then surely you must execute foo before you know what task, if any, I've spawned (it may depend on dynamic data, after all)? Now imagine foo is actually called from within another task, doesn't that mean you have to start executing tasks before you know what tasks are available, meaning you need at least some form of on-line scheduler? -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jarkko L. <al...@gm...> - 2009-04-11 16:11:28
|
The word "priority" is commonly used in task scheduling (e.g. in the GDC talk I pointed out earlier) to define preferred order of task execution. I'm sure I don't have to explain basic scheduling terminology to you, right? Anyway, the system I described is generic and in fact less restrictive than what you have been talking about: You are not forced to statically define dependencies in the code flow (you can do it though), thus you can gain better throughput. But seems like you have firmly set your mind on specific way of doing multi-threading and adamantly believe it results in optimal throughput regardless what people tell you, so good luck with that. Cheers, Jarkko _____ From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Saturday, April 11, 2009 2:55 PM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach On Sat, Apr 11, 2009 at 10:20 AM, Jarkko Lempiainen <al...@gm...> wrote: I was talking about generic task system before. Later on you're saying that it's not a generic task system. By "generic" I mean "can handle all current kinds of control flow", which yours can't, as far as I understand you. Dependencies impact to the task priorities Yes I see what you mean now, but it's very confusing to use the word "priority" here. I was assuming you were talking about a general task system (which *requires* an on-line, as opposed to an ahead-of-time, scheduler - meaning that the word "priority" could only mean a dynamic priority that shuffles things around while tasks are already running). , because you want the longest chain of work (critical path) to start as early as possible in the frame. I think it's worth mentioning yet again that work stealing, an on-line scheduler which doesn't have your limitations, still only depends on the critical path with a constant factor (which, empirically, turns out to be very close to 1 in practice). In contrast, your system *doesn't* have a running time bound that's depends on the critical path with a constant factor as far as I can tell (this is back-of-a-napkin, feel free to do a rigorous analysis)! So I'd be careful claiming that your system is inherently more efficient if asymptotically it's actually worse. It may be better in practice (the constant involved in works stealing may be large), but you'd definitely need empirical data to show this... in games we are in fortunate position that pretty much the same call graph is ran from one frame to the next, so it's possible to exploit frame-to-frame coherency to prioritize tasks and have pretty good idea what's going to happen ahead of time. It's not perfect of course (i.e. when a task is run for the first time there is no priority for it), but it's close and definitely better than not prioritizing at all. Where's your evidence for this? If you compare your *restricted* system running with and without priorities then I'm not surprised, but you have to compare it to a system that isn't as restricted, and actually won't consider something a dependency until it knows for a fact that it is, since this has the potential to drastically affect throughput. The way I see it *loads* of functions start off by checking some internal value or parameter and basically touching different data based on the result. And these functions call *other* functions which do the same thing. If your task system has to pessimistically consider *any* data it could possibly touch a dependency, rather than only the *actual* data it touches, then this would cause the dependency graph to explode exponentially, which could severely restrict your throughput. And like I pointed out before, this still assumes that your code is of the very restricted form that even allows pessimistic dependency tracking, some very simple and common control flow would simply be mathematically impossible to schedule ahead of time (I gave a few examples earlier). In my implementation I don't define dependencies beforehand like you seem to assume you would have to do, but I define them at the call site, so there isn't external maintenance to do but the usage is pretty much like calling a function (1-2 lines of code), so it's quite convenient to use. One motivation for the job queue I implemented was that it should be simple to use so that you could easier take advantage of opportunities for parallelism. How does that actually work? If I spawn a task in the function foo, then surely you must execute foo before you know what task, if any, I've spawned (it may depend on dynamic data, after all)? Now imagine foo is actually called from within another task, doesn't that mean you have to start executing tasks before you know what tasks are available, meaning you need at least some form of on-line scheduler? -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Sebastian S. <seb...@gm...> - 2009-04-11 16:49:53
|
On Sat, Apr 11, 2009 at 5:11 PM, Jarkko Lempiainen <al...@gm...>wrote: > The word “priority” is commonly used in task scheduling (e.g. in the GDC > talk I pointed out earlier) to define preferred order of task execution. I’m > sure I don’t have to explain basic scheduling terminology to you, right? > Anyway, the system I described is generic and in fact less restrictive than > what you have been talking about: You are not forced to statically define > dependencies in the code flow (you can do it though), thus you can gain > better throughput. > The confusion came from the fact that this discussion was about a general task parallelism, not this restricted form you're talking about, and in that context priority in the sense that you meant doesn't really make sense (since dependencies would be enforced "outside" the priority system). I'm just trying to explain why I misunderstood you the first time around. I know what you meant now, and we can stop arguing semantics. I tried to be very clear about what I meant by static/dynamic. By "static" I meant that you need to know the dependencies before you start executing tasks. Is this incorrect? I understand that you can dynamically change the graph of tasks on a frame-by-frame basis, but you still can't have arbitrary dynamic code flow in the tasks themselves (because then it would be impossible to know the dependencies ahead of time, which you need for your scheduler). But seems like you have firmly set your mind on specific way of doing > multi-threading and adamantly believe it results in optimal throughput > regardless what people tell you, so good luck with that. We're talking about code here, not personalities. I'm trying to point out that your system is, as a matter of mathematical fact (not "adamant beliefs"), not a general task system, because it imposes restrictions on the kinds of code you can run through it. I'm not disregarding what people tell me, I'm happy to be shown, and have repeatedly asked for, facts, evidence or even just well-reasoned speculation. So if you think I'm wrong please explain where I'm going wrong rather than bailing out with an ad-hominem attack. Here, I'll even summarize my position for you so you can do it more easily: 1. Your system is an ahead-of-time scheduler, in that the dependencies of your task system must be known before you start executing anything. It can still change from frame-to-frame, but dependencies must be known before execution. 2. By definition certain kinds of control flow can not be scheduled ahead of time (i.e. it's mathematically impossible). 3. Therefore your system is not a general system, because it does not allow dynamic control flow. 4. Dynamic control flow is very common in real code. 5. Your system must have "worst case" dependencies for each task, i.e. if a task has an if-else clause that task must depend on the data used in both branches. Since this is a tree, this gives you expontentially more dependencies in your graph. 6. I haven't done the maths, but I *suspect* that (5) means that the asymptotic time complexity w.r.t. the critical path is not constant, but at least linear and probably exponential (it's a bit tricky to figure out exactly, maybe I'm just thick). 7. In order to schedule dynamic code flow you need an on-line scheduler by definition. 9. Work stealing is an on-line scheduler with provable efficiency (it's asymptotically optimal). 10. (6) and (9) means you can't claim your ahead of time scheduler is better than work-stealing in any absolute sense, though I'd be happy to see benchmarks of specific instances where this is the case, but so far every benchmark I've seen has had work stealing come out on top. 11. All these points taken together means that an on-line scheduler based on work-stealing gives me: Where am I going wrong here? -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Sebastian S. <seb...@gm...> - 2009-04-11 16:51:09
|
On Sat, Apr 11, 2009 at 5:49 PM, Sebastian Sylvan < seb...@gm...> wrote: > > > On Sat, Apr 11, 2009 at 5:11 PM, Jarkko Lempiainen <al...@gm...>wrote: > >> The word “priority” is commonly used in task scheduling (e.g. in the GDC >> talk I pointed out earlier) to define preferred order of task execution. I’m >> sure I don’t have to explain basic scheduling terminology to you, right? >> Anyway, the system I described is generic and in fact less restrictive than >> what you have been talking about: You are not forced to statically define >> dependencies in the code flow (you can do it though), thus you can gain >> better throughput. >> > The confusion came from the fact that this discussion was about a general > task parallelism, not this restricted form you're talking about, and in that > context priority in the sense that you meant doesn't really make sense > (since dependencies would be enforced "outside" the priority system). I'm > just trying to explain why I misunderstood you the first time around. I know > what you meant now, and we can stop arguing semantics. > > I tried to be very clear about what I meant by static/dynamic. By "static" > I meant that you need to know the dependencies before you start executing > tasks. Is this incorrect? > I understand that you can dynamically change the graph of tasks on a > frame-by-frame basis, but you still can't have arbitrary dynamic code flow > in the tasks themselves (because then it would be impossible to know the > dependencies ahead of time, which you need for your scheduler). > > But seems like you have firmly set your mind on specific way of doing >> multi-threading and adamantly believe it results in optimal throughput >> regardless what people tell you, so good luck with that. > > > We're talking about code here, not personalities. > I'm trying to point out that your system is, as a matter of mathematical > fact (not "adamant beliefs"), not a general task system, because it imposes > restrictions on the kinds of code you can run through it. > I'm not disregarding what people tell me, I'm happy to be shown, and have > repeatedly asked for, facts, evidence or even just well-reasoned > speculation. So if you think I'm wrong please explain where I'm going wrong > rather than bailing out with an ad-hominem attack. > > Here, I'll even summarize my position for you so you can do it more easily: > > 1. Your system is an ahead-of-time scheduler, in that the dependencies of > your task system must be known before you start executing anything. It can > still change from frame-to-frame, but dependencies must be known before > execution. > 2. By definition certain kinds of control flow can not be scheduled ahead > of time (i.e. it's mathematically impossible). > 3. Therefore your system is not a general system, because it does not allow > dynamic control flow. > 4. Dynamic control flow is very common in real code. > 5. Your system must have "worst case" dependencies for each task, i.e. if a > task has an if-else clause that task must depend on the data used in both > branches. Since this is a tree, this gives you expontentially more > dependencies in your graph. > 6. I haven't done the maths, but I *suspect* that (5) means that the > asymptotic time complexity w.r.t. the critical path is not constant, but at > least linear and probably exponential (it's a bit tricky to figure out > exactly, maybe I'm just thick). > 7. In order to schedule dynamic code flow you need an on-line scheduler by > definition. > 9. Work stealing is an on-line scheduler with provable efficiency (it's > asymptotically optimal). > 10. (6) and (9) means you can't claim your ahead of time scheduler is > better than work-stealing in any absolute sense, though I'd be happy to see > benchmarks of specific instances where this is the case, but so far every > benchmark I've seen has had work stealing come out on top. > 11. All these points taken together means that an on-line scheduler based > on work-stealing gives me: > > Sorry accidentally pressed "send", 11 should finish with: provable efficiency, scalability, and generality. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: <asy...@gm...> - 2009-04-11 17:24:11
|
But guys, I would really like to have your feedback on asynclib which I'm trying to make as generic as I can. I need to know what functionality you want which I don't have. Would you like to override the scheduling part of the library ? Is there a functionality which you don't like how it was presented and so on? Alexander. 2009/4/11 Sebastian Sylvan <seb...@gm...> > > > On Sat, Apr 11, 2009 at 5:49 PM, Sebastian Sylvan < > seb...@gm...> wrote: > >> >> >> On Sat, Apr 11, 2009 at 5:11 PM, Jarkko Lempiainen <al...@gm...>wrote: >> >>> The word “priority” is commonly used in task scheduling (e.g. in the >>> GDC talk I pointed out earlier) to define preferred order of task execution. >>> I’m sure I don’t have to explain basic scheduling terminology to you, right? >>> Anyway, the system I described is generic and in fact less restrictive than >>> what you have been talking about: You are not forced to statically define >>> dependencies in the code flow (you can do it though), thus you can gain >>> better throughput. >>> >> The confusion came from the fact that this discussion was about a general >> task parallelism, not this restricted form you're talking about, and in that >> context priority in the sense that you meant doesn't really make sense >> (since dependencies would be enforced "outside" the priority system). I'm >> just trying to explain why I misunderstood you the first time around. I know >> what you meant now, and we can stop arguing semantics. >> >> I tried to be very clear about what I meant by static/dynamic. By "static" >> I meant that you need to know the dependencies before you start executing >> tasks. Is this incorrect? >> I understand that you can dynamically change the graph of tasks on a >> frame-by-frame basis, but you still can't have arbitrary dynamic code flow >> in the tasks themselves (because then it would be impossible to know the >> dependencies ahead of time, which you need for your scheduler). >> >> But seems like you have firmly set your mind on specific way of doing >>> multi-threading and adamantly believe it results in optimal throughput >>> regardless what people tell you, so good luck with that. >> >> >> We're talking about code here, not personalities. >> I'm trying to point out that your system is, as a matter of mathematical >> fact (not "adamant beliefs"), not a general task system, because it imposes >> restrictions on the kinds of code you can run through it. >> I'm not disregarding what people tell me, I'm happy to be shown, and have >> repeatedly asked for, facts, evidence or even just well-reasoned >> speculation. So if you think I'm wrong please explain where I'm going wrong >> rather than bailing out with an ad-hominem attack. >> >> Here, I'll even summarize my position for you so you can do it more >> easily: >> >> 1. Your system is an ahead-of-time scheduler, in that the dependencies of >> your task system must be known before you start executing anything. It can >> still change from frame-to-frame, but dependencies must be known before >> execution. >> 2. By definition certain kinds of control flow can not be scheduled ahead >> of time (i.e. it's mathematically impossible). >> 3. Therefore your system is not a general system, because it does not >> allow dynamic control flow. >> 4. Dynamic control flow is very common in real code. >> 5. Your system must have "worst case" dependencies for each task, i.e. if >> a task has an if-else clause that task must depend on the data used in both >> branches. Since this is a tree, this gives you expontentially more >> dependencies in your graph. >> 6. I haven't done the maths, but I *suspect* that (5) means that the >> asymptotic time complexity w.r.t. the critical path is not constant, but at >> least linear and probably exponential (it's a bit tricky to figure out >> exactly, maybe I'm just thick). >> 7. In order to schedule dynamic code flow you need an on-line scheduler by >> definition. >> 9. Work stealing is an on-line scheduler with provable efficiency (it's >> asymptotically optimal). >> 10. (6) and (9) means you can't claim your ahead of time scheduler is >> better than work-stealing in any absolute sense, though I'd be happy to see >> benchmarks of specific instances where this is the case, but so far every >> benchmark I've seen has had work stealing come out on top. >> 11. All these points taken together means that an on-line scheduler based >> on work-stealing gives me: >> >> > > Sorry accidentally pressed "send", 11 should finish with: provable > efficiency, scalability, and generality. > > -- > Sebastian Sylvan > +44(0)7857-300802 > UIN: 44640862 > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Adrian B. <ad...@gm...> - 2009-04-12 23:22:43
|
It sounded like your mechanism has similar downsides as fibers (i.e. much higher memory overhead, slight efficiency overhead) vs. explicitly coded state machines and such. Unless I'm confused, that would probably be a deal breaker for the majority of our code. Cheers, Adrian On Sat, Apr 11, 2009 at 10:23 AM, <asy...@gm...> wrote: > But guys, I would really like to have your feedback on asynclib which I'm > trying to make as generic as I can. > I need to know what functionality you want which I don't have. Would you > like to override the scheduling part of the library ? Is there a > functionality which you don't like how it was presented and so on? > > Alexander. |
|
From: <asy...@gm...> - 2009-04-12 23:56:50
|
Explicitly coded state machines have less memory overhead indeed. But coding them and further supporting them takes a lot of time. And what slight efficiency overhead do you mean ? Alexander 2009/4/13 Adrian Bentley <ad...@gm...> > It sounded like your mechanism has similar downsides as fibers (i.e. > much higher memory overhead, slight efficiency overhead) vs. > explicitly coded state machines and such. Unless I'm confused, that > would probably be a deal breaker for the majority of our code. > > Cheers, > Adrian > > On Sat, Apr 11, 2009 at 10:23 AM, <asy...@gm...> wrote: > > But guys, I would really like to have your feedback on asynclib which I'm > > trying to make as generic as I can. > > I need to know what functionality you want which I don't have. Would you > > like to override the scheduling part of the library ? Is there a > > functionality which you don't like how it was presented and so on? > > > > Alexander. > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Adrian B. <ad...@gm...> - 2009-04-13 17:24:26
|
It may be that the specific state machine solution is not any faster because you're operating directly in the restored general stack (e.g. trade memory for perf). However, jumping around to lots of different task stacks seems like it might have a negative effect on cache behavior. Do you really think the memory overhead required will scale to the number of tasks people are talking about using here? Cheers, Adrian On Sun, Apr 12, 2009 at 4:56 PM, <asy...@gm...> wrote: > Explicitly coded state machines have less memory overhead indeed. But coding > them and further supporting them takes a lot of time. > And what slight efficiency overhead do you mean ? > > Alexander > > 2009/4/13 Adrian Bentley <ad...@gm...> >> >> It sounded like your mechanism has similar downsides as fibers (i.e. >> much higher memory overhead, slight efficiency overhead) vs. >> explicitly coded state machines and such. Unless I'm confused, that >> would probably be a deal breaker for the majority of our code. >> >> Cheers, >> Adrian >> >> On Sat, Apr 11, 2009 at 10:23 AM, <asy...@gm...> wrote: >> > But guys, I would really like to have your feedback on asynclib which >> > I'm >> > trying to make as generic as I can. >> > I need to know what functionality you want which I don't have. Would >> > you >> > like to override the scheduling part of the library ? Is there a >> > functionality which you don't like how it was presented and so on? >> > >> > Alexander. >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by: >> High Quality Requirements in a Collaborative Environment. >> Download a free trial of Rational Requirements Composer Now! >> http://p.sf.net/sfu/www-ibm-com >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > -- > Regards, > Alexander Karnakov > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
|
From: <asy...@gm...> - 2009-04-13 17:58:55
|
>However, jumping around to lots of different >task stacks seems like it might have a negative effect on cache >behavior. Number of memory accesses is not very different from manual state-based approach. Load on the cache you describe basically only depends on how much stack space your task actually touches between switches. And I repeat myself again - you can turn this feature off and use thread's stack for any/all task processing if you want to. >Do you really think the memory overhead required will scale to the >number of tasks people are talking about using here? 4megabytes of memory can handle 2 thousands of 2kilobyte stack task in flight. Is this an ok number for you ? Alexander 2009/4/13 Adrian Bentley <ad...@gm...> > It may be that the specific state machine solution is not any faster > because you're operating directly in the restored general stack (e.g. > trade memory for perf). However, jumping around to lots of different > task stacks seems like it might have a negative effect on cache > behavior. > > Do you really think the memory overhead required will scale to the > number of tasks people are talking about using here? > > Cheers, > Adrian > > On Sun, Apr 12, 2009 at 4:56 PM, <asy...@gm...> wrote: > > Explicitly coded state machines have less memory overhead indeed. But > coding > > them and further supporting them takes a lot of time. > > And what slight efficiency overhead do you mean ? > > > > Alexander > > > > 2009/4/13 Adrian Bentley <ad...@gm...> > >> > >> It sounded like your mechanism has similar downsides as fibers (i.e. > >> much higher memory overhead, slight efficiency overhead) vs. > >> explicitly coded state machines and such. Unless I'm confused, that > >> would probably be a deal breaker for the majority of our code. > >> > >> Cheers, > >> Adrian > >> > >> On Sat, Apr 11, 2009 at 10:23 AM, <asy...@gm...> wrote: > >> > But guys, I would really like to have your feedback on asynclib which > >> > I'm > >> > trying to make as generic as I can. > >> > I need to know what functionality you want which I don't have. Would > >> > you > >> > like to override the scheduling part of the library ? Is there a > >> > functionality which you don't like how it was presented and so on? > >> > > >> > Alexander. > >> > >> > >> > ------------------------------------------------------------------------------ > >> This SF.net email is sponsored by: > >> High Quality Requirements in a Collaborative Environment. > >> Download a free trial of Rational Requirements Composer Now! > >> http://p.sf.net/sfu/www-ibm-com > >> _______________________________________________ > >> GDAlgorithms-list mailing list > >> GDA...@li... > >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > >> Archives: > >> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > > > > -- > > Regards, > > Alexander Karnakov > > > > > ------------------------------------------------------------------------------ > > This SF.net email is sponsored by: > > High Quality Requirements in a Collaborative Environment. > > Download a free trial of Rational Requirements Composer Now! > > http://p.sf.net/sfu/www-ibm-com > > _______________________________________________ > > GDAlgorithms-list mailing list > > GDA...@li... > > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > > Archives: > > > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: <asy...@gm...> - 2009-04-13 18:08:51
|
>Do you really think the memory overhead required will scale to the >number of tasks people are talking about using here? Keep in mind that the amount of memory occupied is computed as number of executing tasks + number of suspended tasks. If you want, you can limit the number of tasks which are active (i.e - suspended) at the same time. In this case if you hit the limit, some amount of tasks (how many - you can control) will be processed to free up memory for other tasks. There is a sample which demonstrates this approach (called "allocator"). Alexander. 2009/4/13 Adrian Bentley <ad...@gm...> > It may be that the specific state machine solution is not any faster > because you're operating directly in the restored general stack (e.g. > trade memory for perf). However, jumping around to lots of different > task stacks seems like it might have a negative effect on cache > behavior. > > Do you really think the memory overhead required will scale to the > number of tasks people are talking about using here? > > Cheers, > Adrian > > On Sun, Apr 12, 2009 at 4:56 PM, <asy...@gm...> wrote: > > Explicitly coded state machines have less memory overhead indeed. But > coding > > them and further supporting them takes a lot of time. > > And what slight efficiency overhead do you mean ? > > > > Alexander > > > > 2009/4/13 Adrian Bentley <ad...@gm...> > >> > >> It sounded like your mechanism has similar downsides as fibers (i.e. > >> much higher memory overhead, slight efficiency overhead) vs. > >> explicitly coded state machines and such. Unless I'm confused, that > >> would probably be a deal breaker for the majority of our code. > >> > >> Cheers, > >> Adrian > >> > >> On Sat, Apr 11, 2009 at 10:23 AM, <asy...@gm...> wrote: > >> > But guys, I would really like to have your feedback on asynclib which > >> > I'm > >> > trying to make as generic as I can. > >> > I need to know what functionality you want which I don't have. Would > >> > you > >> > like to override the scheduling part of the library ? Is there a > >> > functionality which you don't like how it was presented and so on? > >> > > >> > Alexander. > >> > >> > >> > ------------------------------------------------------------------------------ > >> This SF.net email is sponsored by: > >> High Quality Requirements in a Collaborative Environment. > >> Download a free trial of Rational Requirements Composer Now! > >> http://p.sf.net/sfu/www-ibm-com > >> _______________________________________________ > >> GDAlgorithms-list mailing list > >> GDA...@li... > >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > >> Archives: > >> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > > > > -- > > Regards, > > Alexander Karnakov > > > > > ------------------------------------------------------------------------------ > > This SF.net email is sponsored by: > > High Quality Requirements in a Collaborative Environment. > > Download a free trial of Rational Requirements Composer Now! > > http://p.sf.net/sfu/www-ibm-com > > _______________________________________________ > > GDAlgorithms-list mailing list > > GDA...@li... > > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > > Archives: > > > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Adrian B. <ad...@gm...> - 2009-04-13 21:49:19
|
4 MB feels like a lot given console memory limits, but allowing 2000 tasks in flight sounds pedantic. This mechanism can't support SPUs, right? Cheers, Adrian On Mon, Apr 13, 2009 at 11:08 AM, <asy...@gm...> wrote: >>Do you really think the memory overhead required will scale to the >>number of tasks people are talking about using here? > Keep in mind that the amount of memory occupied is computed as number of > executing tasks + number of suspended tasks. If you want, you can limit the > number of tasks which are active (i.e - suspended) at the same time. In this > case if you hit the limit, some amount of tasks (how many - you can control) > will be processed to free up memory for other tasks. There is a sample which > demonstrates this approach (called "allocator"). > > > Alexander. > > 2009/4/13 Adrian Bentley <ad...@gm...> >> >> It may be that the specific state machine solution is not any faster >> because you're operating directly in the restored general stack (e.g. >> trade memory for perf). However, jumping around to lots of different >> task stacks seems like it might have a negative effect on cache >> behavior. >> >> Do you really think the memory overhead required will scale to the >> number of tasks people are talking about using here? >> >> Cheers, >> Adrian >> >> On Sun, Apr 12, 2009 at 4:56 PM, <asy...@gm...> wrote: >> > Explicitly coded state machines have less memory overhead indeed. But >> > coding >> > them and further supporting them takes a lot of time. >> > And what slight efficiency overhead do you mean ? >> > >> > Alexander >> > >> > 2009/4/13 Adrian Bentley <ad...@gm...> >> >> >> >> It sounded like your mechanism has similar downsides as fibers (i.e. >> >> much higher memory overhead, slight efficiency overhead) vs. >> >> explicitly coded state machines and such. Unless I'm confused, that >> >> would probably be a deal breaker for the majority of our code. >> >> >> >> Cheers, >> >> Adrian >> >> >> >> On Sat, Apr 11, 2009 at 10:23 AM, <asy...@gm...> wrote: >> >> > But guys, I would really like to have your feedback on asynclib which >> >> > I'm >> >> > trying to make as generic as I can. >> >> > I need to know what functionality you want which I don't have. Would >> >> > you >> >> > like to override the scheduling part of the library ? Is there a >> >> > functionality which you don't like how it was presented and so on? >> >> > >> >> > Alexander. >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> This SF.net email is sponsored by: >> >> High Quality Requirements in a Collaborative Environment. >> >> Download a free trial of Rational Requirements Composer Now! >> >> http://p.sf.net/sfu/www-ibm-com >> >> _______________________________________________ >> >> GDAlgorithms-list mailing list >> >> GDA...@li... >> >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> >> Archives: >> >> >> >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > >> > >> > >> > -- >> > Regards, >> > Alexander Karnakov >> > >> > >> > ------------------------------------------------------------------------------ >> > This SF.net email is sponsored by: >> > High Quality Requirements in a Collaborative Environment. >> > Download a free trial of Rational Requirements Composer Now! >> > http://p.sf.net/sfu/www-ibm-com >> > _______________________________________________ >> > GDAlgorithms-list mailing list >> > GDA...@li... >> > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> > Archives: >> > >> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by: >> High Quality Requirements in a Collaborative Environment. >> Download a free trial of Rational Requirements Composer Now! >> http://p.sf.net/sfu/www-ibm-com >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > -- > Regards, > Alexander Karnakov > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
|
From: <asy...@gm...> - 2009-04-13 22:11:06
|
>4 MB feels like a lot given console memory limits, but allowing 2000 >tasks in flight sounds pedantic You have full and complete control over memory. There is no limit on the number of tasks (there is a limit on minimum and maximum task size - 32 bytes minimum and about 4megs maximum per task) >This mechanism can't support SPUs, >right? You mean task suspension or complete library port? In general only windows x86 and x64 platforms are supported now. Other platform support will start appearing as soon as people will start showing interest in it. Alexander 2009/4/13 Adrian Bentley <ad...@gm...> > 4 MB feels like a lot given console memory limits, but allowing 2000 > tasks in flight sounds pedantic. This mechanism can't support SPUs, > right? > > Cheers, > Adrian > > On Mon, Apr 13, 2009 at 11:08 AM, <asy...@gm...> wrote: > >>Do you really think the memory overhead required will scale to the > >>number of tasks people are talking about using here? > > Keep in mind that the amount of memory occupied is computed as number of > > executing tasks + number of suspended tasks. If you want, you can limit > the > > number of tasks which are active (i.e - suspended) at the same time. In > this > > case if you hit the limit, some amount of tasks (how many - you can > control) > > will be processed to free up memory for other tasks. There is a sample > which > > demonstrates this approach (called "allocator"). > > > > > > Alexander. > > > > 2009/4/13 Adrian Bentley <ad...@gm...> > >> > >> It may be that the specific state machine solution is not any faster > >> because you're operating directly in the restored general stack (e.g. > >> trade memory for perf). However, jumping around to lots of different > >> task stacks seems like it might have a negative effect on cache > >> behavior. > >> > >> Do you really think the memory overhead required will scale to the > >> number of tasks people are talking about using here? > >> > >> Cheers, > >> Adrian > >> > >> On Sun, Apr 12, 2009 at 4:56 PM, <asy...@gm...> wrote: > >> > Explicitly coded state machines have less memory overhead indeed. But > >> > coding > >> > them and further supporting them takes a lot of time. > >> > And what slight efficiency overhead do you mean ? > >> > > >> > Alexander > >> > > >> > 2009/4/13 Adrian Bentley <ad...@gm...> > >> >> > >> >> It sounded like your mechanism has similar downsides as fibers (i.e. > >> >> much higher memory overhead, slight efficiency overhead) vs. > >> >> explicitly coded state machines and such. Unless I'm confused, that > >> >> would probably be a deal breaker for the majority of our code. > >> >> > >> >> Cheers, > >> >> Adrian > >> >> > >> >> On Sat, Apr 11, 2009 at 10:23 AM, <asy...@gm...> wrote: > >> >> > But guys, I would really like to have your feedback on asynclib > which > >> >> > I'm > >> >> > trying to make as generic as I can. > >> >> > I need to know what functionality you want which I don't have. > Would > >> >> > you > >> >> > like to override the scheduling part of the library ? Is there a > >> >> > functionality which you don't like how it was presented and so on? > >> >> > > >> >> > Alexander. > >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ > >> >> This SF.net email is sponsored by: > >> >> High Quality Requirements in a Collaborative Environment. > >> >> Download a free trial of Rational Requirements Composer Now! > >> >> http://p.sf.net/sfu/www-ibm-com > >> >> _______________________________________________ > >> >> GDAlgorithms-list mailing list > >> >> GDA...@li... > >> >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > >> >> Archives: > >> >> > >> >> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > >> > > >> > > >> > > >> > -- > >> > Regards, > >> > Alexander Karnakov > >> > > >> > > >> > > ------------------------------------------------------------------------------ > >> > This SF.net email is sponsored by: > >> > High Quality Requirements in a Collaborative Environment. > >> > Download a free trial of Rational Requirements Composer Now! > >> > http://p.sf.net/sfu/www-ibm-com > >> > _______________________________________________ > >> > GDAlgorithms-list mailing list > >> > GDA...@li... > >> > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > >> > Archives: > >> > > >> > > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > >> > > >> > >> > >> > ------------------------------------------------------------------------------ > >> This SF.net email is sponsored by: > >> High Quality Requirements in a Collaborative Environment. > >> Download a free trial of Rational Requirements Composer Now! > >> http://p.sf.net/sfu/www-ibm-com > >> _______________________________________________ > >> GDAlgorithms-list mailing list > >> GDA...@li... > >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > >> Archives: > >> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > > > > -- > > Regards, > > Alexander Karnakov > > > > > ------------------------------------------------------------------------------ > > This SF.net email is sponsored by: > > High Quality Requirements in a Collaborative Environment. > > Download a free trial of Rational Requirements Composer Now! > > http://p.sf.net/sfu/www-ibm-com > > _______________________________________________ > > GDAlgorithms-list mailing list > > GDA...@li... > > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > > Archives: > > > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Nicholas \Indy\ R. <ar...@gm...> - 2009-04-13 22:15:08
|
On Mon, Apr 13, 2009 at 2:49 PM, Adrian Bentley <ad...@gm...> wrote: > 4 MB feels like a lot given console memory limits, but allowing 2000 > tasks in flight sounds pedantic. This mechanism can't support SPUs, > right? Also keep in mind that a 2 KB stack is likely to get overflowed REALLY quickly with any real operations going on. Nicholas "Indy" Ray |
|
From: Gregory J. <gj...@da...> - 2009-04-13 22:23:11
|
> Also keep in mind that a 2 KB stack is likely to get overflowed REALLY > quickly with any real operations going on. > Nicholas "Indy" Ray Depends on how you define "real". You can do an awful lot of "real" work within 2K limits, especially if the data you are working on is owned by someone else (and therefore don't need to copy it into the thread stack). Greg |
|
From: <asy...@gm...> - 2009-04-13 22:40:57
|
So far the experience I (and few others) have says that 2 to 4 kilobytes is the size which is enough to handle 95% of all cases. You could use same memory pool for tasks of different sizes (bigger size will result in less tasks you have in flight), for example - 2 continuous pieces of 4k memory could be used to create an 8k task. Alexander 2009/4/14 Nicholas "Indy" Ray <ar...@gm...> > On Mon, Apr 13, 2009 at 2:49 PM, Adrian Bentley <ad...@gm...> wrote: > > 4 MB feels like a lot given console memory limits, but allowing 2000 > > tasks in flight sounds pedantic. This mechanism can't support SPUs, > > right? > > Also keep in mind that a 2 KB stack is likely to get overflowed REALLY > quickly with any real operations going on. > > Nicholas "Indy" Ray > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Jarkko L. <al...@gm...> - 2009-04-11 17:42:35
|
> Where am I going wrong here?
I already clearly told you that I don't define dependencies beforehand but
they are defined at call site, so you are going wrong starting from your 1st
assumption. I'm sorry, but why do I have to keep repeating myself and
correcting your false assumptions if you are not disregard what people tell
you?
Cheers, Jarkko
_____
From: Sebastian Sylvan [mailto:seb...@gm...]
Sent: Saturday, April 11, 2009 7:50 PM
To: Game Development Algorithms
Subject: Re: [Algorithms] General purpose task parallel threading approach
On Sat, Apr 11, 2009 at 5:11 PM, Jarkko Lempiainen <
<mailto:al...@gm...> al...@gm...> wrote:
The word "priority" is commonly used in task scheduling (e.g. in the GDC
talk I pointed out earlier) to define preferred order of task execution. I'm
sure I don't have to explain basic scheduling terminology to you, right?
Anyway, the system I described is generic and in fact less restrictive than
what you have been talking about: You are not forced to statically define
dependencies in the code flow (you can do it though), thus you can gain
better throughput.
The confusion came from the fact that this discussion was about a general
task parallelism, not this restricted form you're talking about, and in that
context priority in the sense that you meant doesn't really make sense
(since dependencies would be enforced "outside" the priority system). I'm
just trying to explain why I misunderstood you the first time around. I know
what you meant now, and we can stop arguing semantics.
I tried to be very clear about what I meant by static/dynamic. By "static" I
meant that you need to know the dependencies before you start executing
tasks. Is this incorrect?
I understand that you can dynamically change the graph of tasks on a
frame-by-frame basis, but you still can't have arbitrary dynamic code flow
in the tasks themselves (because then it would be impossible to know the
dependencies ahead of time, which you need for your scheduler).
But seems like you have firmly set your mind on specific way of doing
multi-threading and adamantly believe it results in optimal throughput
regardless what people tell you, so good luck with that.
We're talking about code here, not personalities.
I'm trying to point out that your system is, as a matter of mathematical
fact (not "adamant beliefs"), not a general task system, because it imposes
restrictions on the kinds of code you can run through it.
I'm not disregarding what people tell me, I'm happy to be shown, and have
repeatedly asked for, facts, evidence or even just well-reasoned
speculation. So if you think I'm wrong please explain where I'm going wrong
rather than bailing out with an ad-hominem attack.
Here, I'll even summarize my position for you so you can do it more easily:
1. Your system is an ahead-of-time scheduler, in that the dependencies of
your task system must be known before you start executing anything. It can
still change from frame-to-frame, but dependencies must be known before
execution.
2. By definition certain kinds of control flow can not be scheduled ahead of
time (i.e. it's mathematically impossible).
3. Therefore your system is not a general system, because it does not allow
dynamic control flow.
4. Dynamic control flow is very common in real code.
5. Your system must have "worst case" dependencies for each task, i.e. if a
task has an if-else clause that task must depend on the data used in both
branches. Since this is a tree, this gives you expontentially more
dependencies in your graph.
6. I haven't done the maths, but I *suspect* that (5) means that the
asymptotic time complexity w.r.t. the critical path is not constant, but at
least linear and probably exponential (it's a bit tricky to figure out
exactly, maybe I'm just thick).
7. In order to schedule dynamic code flow you need an on-line scheduler by
definition.
9. Work stealing is an on-line scheduler with provable efficiency (it's
asymptotically optimal).
10. (6) and (9) means you can't claim your ahead of time scheduler is better
than work-stealing in any absolute sense, though I'd be happy to see
benchmarks of specific instances where this is the case, but so far every
benchmark I've seen has had work stealing come out on top.
11. All these points taken together means that an on-line scheduler based on
work-stealing gives me:
Where am I going wrong here?
--
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862
|
|
From: Sebastian S. <seb...@gm...> - 2009-04-11 17:51:05
|
On Sat, Apr 11, 2009 at 6:42 PM, Jarkko Lempiainen <al...@gm...>wrote: > > Where am I going wrong here? > > > > I already clearly told you that I don’t define dependencies beforehand but > they are defined at call site, so you are going wrong starting from your 1 > st assumption. I’m sorry, but why do I have to keep repeating myself and > correcting your false assumptions if you are not disregard what people tell > you? > Probably because you also say that you sort your tasks ahead of time, which is a contradiction. So are you now saying that your scheduler is *not* an ahead-of-time scheduler but actually an on-line one? -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jarkko L. <al...@gm...> - 2009-04-11 20:59:38
|
Task priorities are updated once per frame, but there is no list of tasks to be sorted before execution of tasks begins. That's purely your own wrong conclusion drawn from. wherever. So, it's online scheduler in the sense that tasks are spawn "randomly" within tasks, but their priorities are determined in the end of the frame for next execution based on task dependencies and execution times. That clears things up? If not, please ask rather than draw further wrong conclusions. Cheers, Jarkko _____ From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Saturday, April 11, 2009 8:51 PM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach On Sat, Apr 11, 2009 at 6:42 PM, Jarkko Lempiainen <al...@gm...> wrote: > Where am I going wrong here? I already clearly told you that I don't define dependencies beforehand but they are defined at call site, so you are going wrong starting from your 1st assumption. I'm sorry, but why do I have to keep repeating myself and correcting your false assumptions if you are not disregard what people tell you? Probably because you also say that you sort your tasks ahead of time, which is a contradiction. So are you now saying that your scheduler is *not* an ahead-of-time scheduler but actually an on-line one? -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Gregory J. <gj...@da...> - 2009-04-11 21:07:06
|
Oh for the love of all that is good and holy.will you two STOP?!? At the least, take it to private email. _____ From: Jarkko Lempiainen [mailto:al...@gm...] Sent: Saturday, April 11, 2009 2:00 PM To: 'Game Development Algorithms' Subject: Re: [Algorithms] General purpose task parallel threading approach Task priorities are updated once per frame, but there is no list of tasks to be sorted before execution of tasks begins. That's purely your own wrong conclusion drawn from. wherever. So, it's online scheduler in the sense that tasks are spawn "randomly" within tasks, but their priorities are determined in the end of the frame for next execution based on task dependencies and execution times. That clears things up? If not, please ask rather than draw further wrong conclusions. Cheers, Jarkko _____ From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Saturday, April 11, 2009 8:51 PM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach On Sat, Apr 11, 2009 at 6:42 PM, Jarkko Lempiainen <al...@gm...> wrote: > Where am I going wrong here? I already clearly told you that I don't define dependencies beforehand but they are defined at call site, so you are going wrong starting from your 1st assumption. I'm sorry, but why do I have to keep repeating myself and correcting your false assumptions if you are not disregard what people tell you? Probably because you also say that you sort your tasks ahead of time, which is a contradiction. So are you now saying that your scheduler is *not* an ahead-of-time scheduler but actually an on-line one? -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Sebastian S. <seb...@gm...> - 2009-04-11 21:43:57
|
On Sat, Apr 11, 2009 at 9:59 PM, Jarkko Lempiainen <al...@gm...>wrote: > Task priorities are updated once per frame, but there is no list of tasks > to be sorted before execution of tasks begins. That’s purely your own wrong > conclusion drawn from… wherever. > For example the presentation you posted that (as far as I can tell without seeing the talk) talked about static ordering of tasks ahead of time. And you also talked about global prioritization, and how you weren't waiting on other tasks within a task etc. > So, it’s online scheduler in the sense that tasks are spawn “randomly” > within tasks, but their priorities are determined in the end of the frame > for next execution based on task dependencies and execution times. That > clears things up? > Okay, that makes sense. You're not actually resolving dependencies by sorting the tasks up front, dependencies are still enforced dynamically, you just sort things that do not have dependencies based on historical behaviour? I.e. you spawn a task and *if* that task existed last frame you prioritize it based on that? Have you measured if the added communication overhead of doing this actually buys you anything on real code? Seems like you'd have to make sure that the overhead of moving things around based on priority must be counteracted by the benefits, and since the constant factor for work-stealing is very close to 1 in empirical studies there doesn't seem to be that much benefit to be gained from shuffling tasks around... If not, please ask rather than draw further wrong conclusions. > I usually try hard to stay on topic, but seriously, I asked you multiple concrete questions about how your system would handle specific scenarios, answering one of those would probably have been more illustrative than throwing ad-hominems around... -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jarkko L. <al...@gm...> - 2009-04-11 23:15:55
|
Yes, only tasks that have been previously executed are prioritized (if not, the task has some default priority). The job queue also processes tasks in groups (e.g. a group per function), so all tasks for a task group have the same priority and are scheduled sequentially for execution (e.g. multiple animate_character(character*) function tasks with different character* form a task group). I profiled that the overhead of adding and executing a task from the queue is around 500 cycles total on my test PC, so it's not bad, but I haven't used the queue in a real game engine yet so I can't tell how well (or badly) it works in real-life scenarios. Anyway, it should be quite easy to change the scheduling strategy of the queue once it's being integrated into an engine to see how it compares to work stealing for example. Cheers, Jarkko _____ From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Sunday, April 12, 2009 12:44 AM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach On Sat, Apr 11, 2009 at 9:59 PM, Jarkko Lempiainen < <mailto:al...@gm...> al...@gm...> wrote: Task priorities are updated once per frame, but there is no list of tasks to be sorted before execution of tasks begins. That's purely your own wrong conclusion drawn from. wherever. For example the presentation you posted that (as far as I can tell without seeing the talk) talked about static ordering of tasks ahead of time. And you also talked about global prioritization, and how you weren't waiting on other tasks within a task etc. So, it's online scheduler in the sense that tasks are spawn "randomly" within tasks, but their priorities are determined in the end of the frame for next execution based on task dependencies and execution times. That clears things up? Okay, that makes sense. You're not actually resolving dependencies by sorting the tasks up front, dependencies are still enforced dynamically, you just sort things that do not have dependencies based on historical behaviour? I.e. you spawn a task and *if* that task existed last frame you prioritize it based on that? Have you measured if the added communication overhead of doing this actually buys you anything on real code? Seems like you'd have to make sure that the overhead of moving things around based on priority must be counteracted by the benefits, and since the constant factor for work-stealing is very close to 1 in empirical studies there doesn't seem to be that much benefit to be gained from shuffling tasks around... If not, please ask rather than draw further wrong conclusions. I usually try hard to stay on topic, but seriously, I asked you multiple concrete questions about how your system would handle specific scenarios, answering one of those would probably have been more illustrative than throwing ad-hominems around... -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Sebastian S. <seb...@gm...> - 2009-04-12 02:07:18
|
On Sun, Apr 12, 2009 at 12:15 AM, Jarkko Lempiainen <al...@gm...>wrote: > Yes, only tasks that have been previously executed are prioritized (if > not, the task has some default priority). The job queue also processes tasks > in groups (e.g. a group per function), so all tasks for a task group have > the same priority and are scheduled sequentially for execution (e.g. > multiple animate_character(character*) function tasks with different > character* form a task group). I profiled that the overhead of adding and > executing a task from the queue is around 500 cycles total on my test PC, so > it’s not bad, but I haven’t used the queue in a real game engine yet so I > can’t tell how well (or badly) it works in real-life scenarios. Anyway, it > should be quite easy to change the scheduling strategy of the queue once > it’s being integrated into an engine to see how it compares to work stealing > for example. > It would definitely be interesting to see some benchmark comparisons... The problem I have with it a priori is that just plain old work stealing works so surprisingly well (performance is very close to the theoretical maximum for a given number of processors and a given length of critical path), that there doesn't seem much room for improvement as far as throughput is concerned. You'd need to test both on the same code I guess because the cost of these things vary, but if we guesstimate a bit then depending on the exact architecture and calling convention (and the number of parameters that need to be pushed on to the stack etc.) let's say cilk's "4x the cost of a function call" equates to about 50 cycles giver or take, then you're adding an order of magnitude extra overhead which seems quite steep to me. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Sebastian S. <seb...@gm...> - 2009-04-11 17:39:58
|
On Sat, Apr 11, 2009 at 6:23 PM, <asy...@gm...> wrote: > But guys, I would really like to have your feedback on asynclib which I'm > trying to make as generic as I can. > I need to know what functionality you want which I don't have. Would you > like to override the scheduling part of the library ? Is there a > functionality which you don't like how it was presented and so on? > I'm sorry, I guess we got side tracked. I still haven't looked at the code (again, apologies), but if I understand this correctly each task must have its own stack, rather than just executing on the stack of the worker thread, is that right? This seems expensive to me, and my point very early on that if you take the approach of having lots of latent parallelism (for scaling, and to sort of "smooth out" semi-random bubbles), the vast majority of tasks will not do any "real" blocking (at a sync point, they'll usually just start executing the tasks on their queue, and even if another threads steals a task you're blocking on, it's not unlikely that it will be done by the time you need to wait), meaning that optimizing for efficient blocking shouldn't be the priority if it causes an additional cost on the spawning of the task itself (i.e. having to allocate and fill-in a much bigger task frame then you'd otherwise need). I believe that the combined cost of spawning and running a task to be *the* most crucial thing to optimize for, as it can never be parallelized and sits squarely in the bottle neck. That said, it would obviously be beneficial if the underlying threading system running these tasks was as efficient as possible at switching for the rare cases when you do get completely blocked, but I think that should be handled at a different level (i.e. at the thread level). Perhaps using Windows 7's user-mode scheduling (which I believe ConcRT is doing), or even using your own system, but I don't think each task needs to be suspendable if it implies *any* amount of extra cost in spawning them (however slight). -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: <asy...@gm...> - 2009-04-11 18:07:53
|
>but if I understand this correctly each task must have its own stack, rather than just executing on the stack of the worker thread, is that right? That's true, each task by default runs in its own stack space. But you can always switch to thread's stack and execute your code there, in this case you only need 32 bytes for the task and you loose magic task-suspending ability. There is a minimal task sample demonstrating this behavior >majority of tasks will not do any "real" blocking By "blocking" I mean any kind of a way to control task behavior in regard to other tasks and resources. This can be anything from shared resource access to task dependency resolving. There are also many exotic situations, for example where you would like limit number of tasks running simultaneously on some region of the code, or you want to explicitly switch a set of tasks on some code region, or you want to process a set of tasks which have reached some place in the code at some point in time. Apparently, even in games, I have seen many situations where it proved to be really useful (especially in the background asset streaming code), and improved task throughput by a significant amount. >meaning that optimizing for efficient blocking shouldn't be the priority if it causes an additional cost on the spawning of the task itself (i.e. having to allocate and fill-in a much bigger task frame then >you'd otherwise need). You should also know that there is no cost for a feature if you don't use it. If you don't use blocking, don't wait on other tasks to finish - there is no overhead. Task startup (after scheduler decides what to execute) is a matter of several assembly instructions in itself. >That said, it would obviously be beneficial if the underlying threading system running these tasks was as efficient as possible at switching for the rare cases That's exactly what I target, except that I also optimize rare cases as well (since "rare" for games is not that rare for some other problem). If you decide to try the library out, and will suspect some part of it speedwise - just let me know. >but I don't think each task needs to be suspendable if it implies *any* amount of extra cost in spawning them (however slight). No extra cost, really ;). Alexander 2009/4/11 Sebastian Sylvan <seb...@gm...> > > > On Sat, Apr 11, 2009 at 6:23 PM, <asy...@gm...> wrote: > >> But guys, I would really like to have your feedback on asynclib which I'm >> trying to make as generic as I can. >> I need to know what functionality you want which I don't have. Would you >> like to override the scheduling part of the library ? Is there a >> functionality which you don't like how it was presented and so on? >> > > I'm sorry, I guess we got side tracked. > > I still haven't looked at the code (again, apologies), but if I understand > this correctly each task must have its own stack, rather than just executing > on the stack of the worker thread, is that right? > This seems expensive to me, and my point very early on that if you take the > approach of having lots of latent parallelism (for scaling, and to sort of > "smooth out" semi-random bubbles), the vast majority of tasks will not do > any "real" blocking (at a sync point, they'll usually just start executing > the tasks on their queue, and even if another threads steals a task you're > blocking on, it's not unlikely that it will be done by the time you need to > wait), meaning that optimizing for efficient blocking shouldn't be the > priority if it causes an additional cost on the spawning of the task itself > (i.e. having to allocate and fill-in a much bigger task frame then you'd > otherwise need). > I believe that the combined cost of spawning and running a task to be *the* > most crucial thing to optimize for, as it can never be parallelized and sits > squarely in the bottle neck. > > That said, it would obviously be beneficial if the underlying threading > system running these tasks was as efficient as possible at switching for the > rare cases when you do get completely blocked, but I think that should be > handled at a different level (i.e. at the thread level). Perhaps using > Windows 7's user-mode scheduling (which I believe ConcRT is doing), or even > using your own system, but I don't think each task needs to be suspendable > if it implies *any* amount of extra cost in spawning them (however slight). > > > -- > Sebastian Sylvan > +44(0)7857-300802 > UIN: 44640862 > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Jarkko L. <al...@gm...> - 2009-04-12 08:39:24
|
Well, I see the good task prioritization as a dominating factor in task scheduling, e.g. if you save even 1ms shared across /all/ the threads due to better scheduling, you have already gained the overhead of executing ~5000 tasks in a frame. But you seem to disagree that task prioritization gains you anything, am I right? And then there are of course issues with the cache coherency and programmer productivity, so individual task overhead seems to pale in comparison when speaking of such a small differences (in cycles) and not something really worth optimizing further. But yes, real-life benchmark would probably shine some light on this, but then again you could always continue arguing that the benchmark favors either approach, so I doubt a single benchmark proves anything either way. I guess in the end it boils down to which ever approach is simplest to use and understand by the majority and yet scales reasonably well thus gets adopted as the best practice, even though it doesn't necessarily result in the absolute best throughput. Cheers, Jarkko _____ From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Sunday, April 12, 2009 5:07 AM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach On Sun, Apr 12, 2009 at 12:15 AM, Jarkko Lempiainen <al...@gm...> wrote: Yes, only tasks that have been previously executed are prioritized (if not, the task has some default priority). The job queue also processes tasks in groups (e.g. a group per function), so all tasks for a task group have the same priority and are scheduled sequentially for execution (e.g. multiple animate_character(character*) function tasks with different character* form a task group). I profiled that the overhead of adding and executing a task from the queue is around 500 cycles total on my test PC, so it's not bad, but I haven't used the queue in a real game engine yet so I can't tell how well (or badly) it works in real-life scenarios. Anyway, it should be quite easy to change the scheduling strategy of the queue once it's being integrated into an engine to see how it compares to work stealing for example. It would definitely be interesting to see some benchmark comparisons... The problem I have with it a priori is that just plain old work stealing works so surprisingly well (performance is very close to the theoretical maximum for a given number of processors and a given length of critical path), that there doesn't seem much room for improvement as far as throughput is concerned. You'd need to test both on the same code I guess because the cost of these things vary, but if we guesstimate a bit then depending on the exact architecture and calling convention (and the number of parameters that need to be pushed on to the stack etc.) let's say cilk's "4x the cost of a function call" equates to about 50 cycles giver or take, then you're adding an order of magnitude extra overhead which seems quite steep to me. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |