Thread: Re: [Algorithms] General purpose task parallel threading approach (Page 12)
Brought to you by:
vexxed72
|
From: Tony C. <to...@mi...> - 2009-04-13 18:37:25
|
" that any parallel loop with a static call graph could be turned into a 16 wide vector version mechanically." "Vector complete" is also a more or less irrelevant property for real world problems, since the "static call graph" constraint is typically only satisfied by trivial code. Simple shader type things meet the requirement, complex ones with conditionals don't, and good luck with anything resembling game logic. Don't get me wrong, I think LRB is doing some really interesting things, but it's by no means anything like a silver bullet. - Tony From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Monday, April 13, 2009 10:38 AM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach The cool thing about LRB is that it's what they call "vector complete", meaning it has all of the machinery (scatter/gather etc.) that any parallel loop with a static call graph could be turned into a 16 wide vector version mechanically. This should hopefully allow the compiler to do most of this work for us if we just mark up the loops we think ought to be data parallel using some pragma. On Mon, Apr 13, 2009 at 6:28 PM, Adrian Bentley <ad...@gm...<mailto:ad...@gm...>> wrote: Daunting to be sure. You know... this would be a great place for a good macro system. Commence language war! Cheers, Adrian |
|
From: Sebastian S. <seb...@gm...> - 2009-04-13 19:00:36
|
On Mon, Apr 13, 2009 at 7:37 PM, Tony Cox <to...@mi...> wrote: > " that any parallel loop with a static call graph could be turned into a > 16 wide vector version mechanically." > > > > "Vector complete" is also a more or less irrelevant property for real world > problems, since the "static call graph" constraint is typically only > satisfied by trivial code. Simple shader type things meet the requirement, > complex ones with conditionals don't, and good luck with anything resembling > game logic. > > > > Don't get me wrong, I think LRB is doing some really interesting things, > but it's by no means anything like a silver bullet. > Yeah definitely, you'll probably only find suitable data parallelism very close to the leaves of the call graph, not in the big bulky middle, so you clearly need an efficient general purpose CPU to run most of the code. It is a lot more flexible then we're used to at dealing with those rare data parallel bits though. Of course, it remains unknown how it will stack up against whatever Nvidia/AMD will have out at the time of its release, as they're both getting more and more general purpose too. Whatever happens I'm sure it'll be an interesting couple of years ahead! -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jarkko L. <al...@gm...> - 2009-04-14 10:25:32
|
Personally, I try to do allocations from the stack when ever possible to avoid memory fragmentation issues, so 2-4kb stack sounds quite small for my purposes at least. Cheers, Jarkko _____ From: asy...@gm... [mailto:asy...@gm...] Sent: Tuesday, April 14, 2009 1:41 AM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach So far the experience I (and few others) have says that 2 to 4 kilobytes is the size which is enough to handle 95% of all cases. You could use same memory pool for tasks of different sizes (bigger size will result in less tasks you have in flight), for example - 2 continuous pieces of 4k memory could be used to create an 8k task. Alexander 2009/4/14 Nicholas "Indy" Ray <ar...@gm...> On Mon, Apr 13, 2009 at 2:49 PM, Adrian Bentley <ad...@gm...> wrote: > 4 MB feels like a lot given console memory limits, but allowing 2000 > tasks in flight sounds pedantic. This mechanism can't support SPUs, > right? Also keep in mind that a 2 KB stack is likely to get overflowed REALLY quickly with any real operations going on. Nicholas "Indy" Ray ---------------------------------------------------------------------------- -- This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ GDAlgorithms-list mailing list GDA...@li... https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list Archives: http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list -- Regards, Alexander Karnakov |
|
From: <asy...@gm...> - 2009-04-14 12:51:52
|
>I try to do allocations from the stack when ever possible to avoid memory fragmentation issues You can temporary convert a task to be executed in thread's context. I could also provide a function to access worker thread's stack for temporal computations. Alexander. 2009/4/14 Jarkko Lempiainen <al...@gm...> > Personally, I try to do allocations from the stack when ever possible to > avoid memory fragmentation issues, so 2-4kb stack sounds quite small for my > purposes at least. > > > > > > Cheers, Jarkko > > > ------------------------------ > > *From:* asy...@gm... [mailto:asy...@gm...] > *Sent:* Tuesday, April 14, 2009 1:41 AM > *To:* Game Development Algorithms > *Subject:* Re: [Algorithms] General purpose task parallel threading > approach > > > > So far the experience I (and few others) have says that 2 to 4 kilobytes is > the size which is enough to handle 95% of all cases. You could use same > memory pool for tasks of different sizes (bigger size will result in less > tasks you have in flight), for example - 2 continuous pieces of 4k memory > could be used to create an 8k task. > > Alexander > > 2009/4/14 Nicholas "Indy" Ray <ar...@gm...> > > On Mon, Apr 13, 2009 at 2:49 PM, Adrian Bentley <ad...@gm...> wrote: > > 4 MB feels like a lot given console memory limits, but allowing 2000 > > tasks in flight sounds pedantic. This mechanism can't support SPUs, > > right? > > Also keep in mind that a 2 KB stack is likely to get overflowed REALLY > quickly with any real operations going on. > > Nicholas "Indy" Ray > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > -- > Regards, > Alexander Karnakov > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Jon W. <jw...@gm...> - 2009-04-14 19:41:09
|
Sebastian Sylvan wrote: > That's not the same as saying that flexibility isn't desirable, > though. What I said was that if you evaluate a certain system you > would put "flexible" and "general" in the "pros" column, and > conversely you would put "rigid" and "specialized" in the "cons" > column. There may be a whole host of other things going into the two > columns which guide you to your final conclusion, but that doesn't > mean that flexibility in itself isn't a desirable trait, nor that > restrictions/rigidity is. That's the "guns don't kill people" argument. It turns out that, in real life, you never get "generality" or "flexibility" without a cost, and often, those costs and benefits are evaluated by someone who has a very different set of goals than you do. For example, the C++ STL is a very flexible, very general set of containers and algorithms. However, a number of assumptions made in the construction of that library makes it, or its implementations, very poor in many circumstances. For example, various STL implementations use static buffers for intermediary storage, so you can't even use the containers in threaded code! As another example, the allocators involved often make different assumptions about memory than what will be true for applications that manage their own memory, such as games, embedded systems, kernels and database servers (among some examples). It turns out that the standard C library, with malloc(), fread() and memcpy(), is actually a lot easier to re-use across a wider range of applications, even though it is, by pretty much all measures of the words, less flexible, and less general, than the STL. To tie it back to asynclib: Asynclib is written by a transaction server programmer, who solved a transaction server problem, and is now trying to expand it to solve more kinds of problems. However, there are some classes of problems (such as, in my argument, micro-tasks for games) where the fundamental assumptions are so different that it doesn't make sense to apply a "one size fits all" component to solve that problem. Smaller, more well defined components are almost always a better match, as long as you can find those smaller, more well defined components that match your needs. Sincerely, jw |
|
From: Sebastian S. <seb...@gm...> - 2009-04-14 20:04:30
|
On Tue, Apr 14, 2009 at 8:40 PM, Jon Watte <jw...@gm...> wrote: > Sebastian Sylvan wrote: > > That's not the same as saying that flexibility isn't desirable, > > though. What I said was that if you evaluate a certain system you > > would put "flexible" and "general" in the "pros" column, and > > conversely you would put "rigid" and "specialized" in the "cons" > > column. There may be a whole host of other things going into the two > > columns which guide you to your final conclusion, but that doesn't > > mean that flexibility in itself isn't a desirable trait, nor that > > restrictions/rigidity is. > > That's the "guns don't kill people" argument. It turns out that, in real > life, you never get "generality" or "flexibility" without a cost, and > often, those costs and benefits are evaluated by someone who has a very > different set of goals than you do. I didn't say that, I said they were data points worth considering. Also "never" is a very strong statement word to use, there are plenty of cases when you can arbitrarily restrict an API for no real benefit because you just didn't think of a way of doing it better when you first wrote it. There are other cases where you can restrict an API because you think it will buy you something, but then it turns out that the rigidity actually hurts you because people have to work around the restrictions in ways which end up costing more than the imagined benefits.. > > To tie it back to asynclib: Asynclib is written by a transaction server > programmer, who solved a transaction server problem, and is now trying > to expand it to solve more kinds of problems. However, there are some > classes of problems (such as, in my argument, micro-tasks for games) > where the fundamental assumptions are so different that it doesn't make > sense to apply a "one size fits all" component to solve that problem. > Smaller, more well defined components are almost always a better match, > as long as you can find those smaller, more well defined components that > match your needs. Well I certainly disagree with that, and I don't think you've demonstrated the win, but I think we've had that argument to death already so no point going over it again. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: <asy...@gm...> - 2009-04-14 20:20:19
|
>and is now trying >to expand it to solve more kinds of problems. However, there are some >classes of problems (such as, in my argument, micro-tasks for games) >where the fundamental assumptions are so different that it doesn't make If I understood you correctly, you say that it is not really going to work for games (because it works for client-server apps ?), but I don't really see why I can't succeed ? I think I've explained how to achieve some specific details people were wondering about pretty clearly, but at this point, after 170 messages posted under this thread, I still didn't get a clear picture of what I need to change to make asynclib an attractive threading library for games. Hopefully I didn't miss anything, but if I did, please point me at it again. Alexander. 2009/4/14 Jon Watte <jw...@gm...> > Sebastian Sylvan wrote: > > That's not the same as saying that flexibility isn't desirable, > > though. What I said was that if you evaluate a certain system you > > would put "flexible" and "general" in the "pros" column, and > > conversely you would put "rigid" and "specialized" in the "cons" > > column. There may be a whole host of other things going into the two > > columns which guide you to your final conclusion, but that doesn't > > mean that flexibility in itself isn't a desirable trait, nor that > > restrictions/rigidity is. > > That's the "guns don't kill people" argument. It turns out that, in real > life, you never get "generality" or "flexibility" without a cost, and > often, those costs and benefits are evaluated by someone who has a very > different set of goals than you do. > > For example, the C++ STL is a very flexible, very general set of > containers and algorithms. However, a number of assumptions made in the > construction of that library makes it, or its implementations, very poor > in many circumstances. For example, various STL implementations use > static buffers for intermediary storage, so you can't even use the > containers in threaded code! As another example, the allocators involved > often make different assumptions about memory than what will be true for > applications that manage their own memory, such as games, embedded > systems, kernels and database servers (among some examples). > > It turns out that the standard C library, with malloc(), fread() and > memcpy(), is actually a lot easier to re-use across a wider range of > applications, even though it is, by pretty much all measures of the > words, less flexible, and less general, than the STL. > > To tie it back to asynclib: Asynclib is written by a transaction server > programmer, who solved a transaction server problem, and is now trying > to expand it to solve more kinds of problems. However, there are some > classes of problems (such as, in my argument, micro-tasks for games) > where the fundamental assumptions are so different that it doesn't make > sense to apply a "one size fits all" component to solve that problem. > Smaller, more well defined components are almost always a better match, > as long as you can find those smaller, more well defined components that > match your needs. > > Sincerely, > > jw > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Jon W. <jw...@gm...> - 2009-04-14 20:37:20
|
asy...@gm... wrote: > I still didn't get a clear picture of what I need to change to make > asynclib an attractive threading library for games. As seen from my point of view, the two top-level changes would be: 1) Don't allocate a stack per task. Instead, allocate a stack per worker thread. (The bonus is that these stacks can be large enough to be useful, without too much overhead) 2) Let each task have a dependency tree, where a task will automatically become runnable when all its dependencies are complete (and a typical dependency will be the completion of some other task). Now, if you really want to support arbitrary blocking within the execution of a micro-task (which I feel is a mistake), then you can make it so that you allocate a new stack for the next task only when some stack is "used" because there is a blocking task. Sincerely, jw |
|
From: <asy...@gm...> - 2009-04-14 21:42:45
|
Ok, regarding point 2, isn't the following pseudo-code does what you want ?
void TaskWhichWaitsForDepsToComplete()
{
async::Task *TaskA, *TaskB, *TaskC;
... do some work here
async::Execute(TaskAProc, &TaskA, ...);
... do some more work here
async::Execute(TaskBProc, &TaskB, ...);
... do some more work here
async::Execute(TaskCProc, &TaskC, ...);
... do even more work here
async::WaitAll(*TaskA, *TaskB, *TaskC); // this call makes a task wait
for tasks A,B,C. Optionally it can either help out worker threads while
waiting, or it can simply block
TaskA->Release(); TaskB->Release(); TaskC->Release();
... finally do some work at the end of task
return; // done!
};
void main()
{
async::Task* TheTask;
async::Execute(&TaskWhichWaitsForDepsToComplete, &TheTask, ...);
async::Wait(*TheTask);
TheTask->Release();
}
You can additionally put any other primitive into async::Wait, and you can
put as many of those as you want.
As an example, you can even achieve point 2 in a different way:
void DependeeTask(void* pSemaphore)
{
... do work
((async::Semaphore*) pSemaphore)->Release();
}
void MainTask()
{
int nDependiesToSpawn = xxx;
async::Semaphose DependendsSemaphore; // create a semaphore on the
stack, which is actually an 8 bytes zero initialized
for (int i = 0; i < nDependiesToSpawn; ++i)
asyn::Execute(&DependeeTask, &DependendsSemaphore, ...);
DependendsSemaphore.Achquire(nDependiesToSpawn); //waits until all task
release the semaphore, which happens at the end of task
}
Now regarding point 1:
void ThisTaskWillBeExecutedInsideWorkerThread()
{
printf("hello");
}
void TaskWhichExecutesCodeInsideTheWorkerThreadStack(void* pUserData)
{
// assuming that pData holds a pointer to what actually needs to be
executed
async::ExecuteAsThread((void* (*)(void*)) pData); // executes a function
in the stack of the worker thread - this is a veeery lightweight function
(check with the debugger if you don't beleive me)
}
void main()
{
async::Task* TheTask;
async::Execute(&TaskWhichExecutesCodeInsideTheWorkerThreadStack,
&ThisTaskWillBeExecutedInsideWorkerThread, // user
data pointer
&TheTask
...,
async::MINIMAL_TASK_SIZE); // 32 bytes for task
structure
async::Wait(*TheTask);
TheTask->Release();
}
This code does exactly what you require under point1. I can create an option
which will not require you to create a wrapper function (which is
TaskWhichExecutesCodeInsideTheWorkerThreadStack in this example).
Would that solution suit you for both points 1 and 2 ?
Alexander.
2009/4/14 Jon Watte <jw...@gm...>
> asy...@gm... wrote:
> > I still didn't get a clear picture of what I need to change to make
> > asynclib an attractive threading library for games.
>
> As seen from my point of view, the two top-level changes would be:
>
> 1) Don't allocate a stack per task. Instead, allocate a stack per worker
> thread. (The bonus is that these stacks can be large enough to be
> useful, without too much overhead)
>
> 2) Let each task have a dependency tree, where a task will automatically
> become runnable when all its dependencies are complete (and a typical
> dependency will be the completion of some other task).
>
> Now, if you really want to support arbitrary blocking within the
> execution of a micro-task (which I feel is a mistake), then you can make
> it so that you allocate a new stack for the next task only when some
> stack is "used" because there is a blocking task.
>
> Sincerely,
>
> jw
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> High Quality Requirements in a Collaborative Environment.
> Download a free trial of Rational Requirements Composer Now!
> http://p.sf.net/sfu/www-ibm-com
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>
--
Regards,
Alexander Karnakov
|
|
From: Nicholas \Indy\ R. <ar...@gm...> - 2009-04-14 22:08:52
|
On Tue, Apr 14, 2009 at 2:42 PM, <asy...@gm...> wrote: > Now regarding point 1: The problem, Alexander, is that once you address point one, you are disposing of the vast majority of the functionality that you're library provides, and at that point, we have a lot of extra runtime sitting there being useless. I think this falls in a case of a component being too general for the job. Indy |
|
From: <asy...@gm...> - 2009-04-14 22:26:45
|
>The problem, Alexander, is that once you address point one, you are >disposing of the vast majority of the functionality that you're >library provides That's not really true, you can still execute other tasks the same way in a recursive manner (but keep away of dependencies on something up the worker thread's stack). Besides, even if it would be fully true, then "the vast majority of the functionality" is actually a ~2k of code which you never execute, is this a "generalization" overhead you can't afford? Alexander. 2009/4/15 Nicholas "Indy" Ray <ar...@gm...> > On Tue, Apr 14, 2009 at 2:42 PM, <asy...@gm...> wrote: > > Now regarding point 1: > > The problem, Alexander, is that once you address point one, you are > disposing of the vast majority of the functionality that you're > library provides, and at that point, we have a lot of extra runtime > sitting there being useless. I think this falls in a case of a > component being too general for the job. > > Indy > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Nicholas \Indy\ R. <ar...@gm...> - 2009-04-15 00:37:06
|
On Tue, Apr 14, 2009 at 3:26 PM, <asy...@gm...> wrote: > That's not really true, you can still execute other tasks the same way in a > recursive manner (but keep away of dependencies on something up the worker > thread's stack). I'm sorry, that statement makes little sense to me. > Besides, even if it would be fully true, then "the vast majority of the > functionality" is actually a ~2k of code which you never execute, is this a > "generalization" overhead you can't afford? ~2k loc have a tendency to muck up performance, and other such things. But alas, when I have to include 2k useless loc for functionality better provided by a specific library that takes only a few hours to write/debug/test I'd always go for the later, so yes, that is *certainly* a generalization overhead I cannot afford. Indy |
|
From: <asy...@gm...> - 2009-04-15 05:50:20
|
>> That's not really true, you can still execute other tasks the same way in a >> recursive manner (but keep away of dependencies on something up the worker >> thread's stack). >I'm sorry, that statement makes little sense to me. When a task gets blocked by a dependency, worker thread can start executing other tasks recursively. >that takes only a few hours to >write/debug/test I'd always go for the later, so yes, that is >*certainly* a generalization overhead I cannot afford. I could always provide a special build without the stuff you are not going to use. But if you think that the rest of functionality would really take you a couple of hours to write/debug/test then of course you'd better go with your own solution. Alexander. >~2k loc have a tendency to muck up performance, and other such things. 2009/4/15 Nicholas "Indy" Ray <ar...@gm...> > On Tue, Apr 14, 2009 at 3:26 PM, <asy...@gm...> wrote: > > That's not really true, you can still execute other tasks the same way in > a > > recursive manner (but keep away of dependencies on something up the > worker > > thread's stack). > > I'm sorry, that statement makes little sense to me. > > > Besides, even if it would be fully true, then "the vast majority of the > > functionality" is actually a ~2k of code which you never execute, is this > a > > "generalization" overhead you can't afford? > > ~2k loc have a tendency to muck up performance, and other such things. > But alas, when I have to include 2k useless loc for functionality > better provided by a specific library that takes only a few hours to > write/debug/test I'd always go for the later, so yes, that is > *certainly* a generalization overhead I cannot afford. > > Indy > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Jon W. <jw...@gm...> - 2009-04-15 15:17:22
|
asy...@gm... wrote: > When a task gets blocked by a dependency, worker thread can start > executing other tasks recursively. > So, in my world, I know what dependencies a task has before that task is started. Thus, I can prove that the task will not block, and thus no recursion (or additional stacks) necessary. The trade-off is that new dependencies can be introduced, but not synchronized, until after the current task invocation ends. The benefit is that scheduling can be a lot smarter, and the requirements for running a task are known and can be bounded (very important in embedded and otherwise limited environments). And, because tasks do not block while executing, this also means that there are no extra context switches -- each task gets switched in; that's the extent of switching. Thus, slightly less pressure on the context switcher (but optimizing switching is still worthwhile, up to a point). Sincerely, jw |
|
From: <asy...@gm...> - 2009-04-15 18:08:47
|
>So, in my world, I know what dependencies a task has before that task is >The trade-off is that new >dependencies can be introduced, but not synchronized That's an ok tradeoff for your project, but for someone else these restrictions may seem too exotic because a problem might be more dynamic in nature. What I try to do is to provide enough possibilities for you guys to achieve what you want with less amount of movements. What you say, Jon, could be achieved if you'd simply call async::Wait not inside the task but outside it. You could wait not directly on a task, but on a proxy synchronization object. You could make a lot fewer synchronization points (relative to a task) very far outside the tasks you work with, and resolve the dependency for a bigger number of tasks, etc, etc. As I've with point 2 you can do the same thing in many ways, where each way gives you slightly different possibilites. Of course you'll probably want to wrap some repetive actions you do into a set of another primitives or functions to ease programming for your version of the multithreading model which you view as a best approach to solve your specific problem. I have actually a question regarding point 1 (where you say a task needs to be executed inside thread's stack and nowhere else). I think I know what you want to say here: a. - you need to have a growable stack b. - you need cpu to access the same piece of memory from task to task. As I can see now the only thing I can't provide is 'a.'. I'll work on 'b.' together with new optimized scheduler which will reuse the same memory for tasks as much as possible to prevent cache thrashing on a particular cpu. Point a. is an OS feature which I can't provide in usermode, though I do provide a way to handle 3rd party library calls for which you don't know stack requirements, and in the future I'll try provide some automation for the code running in small stack environment. The whole purpose of asynclib idea was to simplify programming. Anything asynclib does you can do it manually, for example, a sync point in the middle of a task can be achieved without asynclib by splitting the task into 2 functions - one does work before sync point, another does it after, and register 2nd part in some interlocked queue which will be traversed by some other code. Task suspension can be achieved by saving data you need to restore in some structure and restoring it later and so on. Can you squeze out a bit more performance ? Of course you can, since you have all required information you could an instruction here and there. You can do all that, but for that you need to write code and spend programmer's time on that, horrify the code by introducing a quadrillion of callbacks, completion functions, etc. etc, where as I try to provide a good enough solution which handles all that automatically, and seems like that the only thing I can't handle is stack growing, any other aspect of behaviour is achievable one way or another with a very acceptable performance imho (yet I've noted some performance advices you mentioned and will work on them). Alexander. 2009/4/15 Jon Watte <jw...@gm...> > asy...@gm... wrote: > > When a task gets blocked by a dependency, worker thread can start > > executing other tasks recursively. > > > > So, in my world, I know what dependencies a task has before that task is > started. Thus, I can prove that the task will not block, and thus no > recursion (or additional stacks) necessary. The trade-off is that new > dependencies can be introduced, but not synchronized, until after the > current task invocation ends. The benefit is that scheduling can be a > lot smarter, and the requirements for running a task are known and can > be bounded (very important in embedded and otherwise limited environments). > > And, because tasks do not block while executing, this also means that > there are no extra context switches -- each task gets switched in; > that's the extent of switching. Thus, slightly less pressure on the > context switcher (but optimizing switching is still worthwhile, up to a > point). > > Sincerely, > > jw > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Nicholas \Indy\ R. <ar...@gm...> - 2009-04-15 06:24:23
|
On 4/14/09, asy...@gm... <asy...@gm...> wrote: > When a task gets blocked by a dependency, worker thread can start executing > other tasks recursively. But if you are are doing a single stack per worker thread, you can't actually swap tasks as the stack is already being used. Indy |
|
From: <asy...@gm...> - 2009-04-15 06:46:11
|
A task is not swapped in this case, it just stays on the worker thread's stack, and worker thread recursively starts doing next task. When dependee (is this the right word ?) task is done, dependent task will get resumed when worker thread finishes tasks it has started working on during the wait period. This fits fine for dependencies which are structured as a tree. Alexander. 2009/4/15 Nicholas "Indy" Ray <ar...@gm...> > On 4/14/09, asy...@gm... <asy...@gm...> wrote: > > When a task gets blocked by a dependency, worker thread can start > executing > > other tasks recursively. > > But if you are are doing a single stack per worker thread, you can't > actually swap tasks as the stack is already being used. > > Indy > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |
|
From: Gregory J. <gj...@da...> - 2009-04-15 18:46:47
|
Honestly, Alexander, at this point you largely are re-implementing TBB. Not that this by itself should give you pause, but (and maybe this is lost on me over the past 180 or so posts), what are you offering that TBB doesn't? Given that a library such as asynclib will be considered by those starting a new project, and not likely by those already using a particular solution (whether commercial or in-house), what is(are) asynclib's differentiating factor(s)? On a side note, if you want to know what would make asynclib more appealing in general, I'd recommend spending more time in the TBB forums, where plenty of feedback on this topic is available. Regards Greg From: asy...@gm... [mailto:asy...@gm...] Sent: Wednesday, April 15, 2009 11:09 AM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach >So, in my world, I know what dependencies a task has before that task is >The trade-off is that new >dependencies can be introduced, but not synchronized That's an ok tradeoff for your project, but for someone else these restrictions may seem too exotic because a problem might be more dynamic in nature. What I try to do is to provide enough possibilities for you guys to achieve what you want with less amount of movements. What you say, Jon, could be achieved if you'd simply call async::Wait not inside the task but outside it. You could wait not directly on a task, but on a proxy synchronization object. You could make a lot fewer synchronization points (relative to a task) very far outside the tasks you work with, and resolve the dependency for a bigger number of tasks, etc, etc. As I've with point 2 you can do the same thing in many ways, where each way gives you slightly different possibilites. Of course you'll probably want to wrap some repetive actions you do into a set of another primitives or functions to ease programming for your version of the multithreading model which you view as a best approach to solve your specific problem. I have actually a question regarding point 1 (where you say a task needs to be executed inside thread's stack and nowhere else). I think I know what you want to say here: a. - you need to have a growable stack b. - you need cpu to access the same piece of memory from task to task. As I can see now the only thing I can't provide is 'a.'. I'll work on 'b.' together with new optimized scheduler which will reuse the same memory for tasks as much as possible to prevent cache thrashing on a particular cpu. Point a. is an OS feature which I can't provide in usermode, though I do provide a way to handle 3rd party library calls for which you don't know stack requirements, and in the future I'll try provide some automation for the code running in small stack environment. The whole purpose of asynclib idea was to simplify programming. Anything asynclib does you can do it manually, for example, a sync point in the middle of a task can be achieved without asynclib by splitting the task into 2 functions - one does work before sync point, another does it after, and register 2nd part in some interlocked queue which will be traversed by some other code. Task suspension can be achieved by saving data you need to restore in some structure and restoring it later and so on. Can you squeze out a bit more performance ? Of course you can, since you have all required information you could an instruction here and there. You can do all that, but for that you need to write code and spend programmer's time on that, horrify the code by introducing a quadrillion of callbacks, completion functions, etc. etc, where as I try to provide a good enough solution which handles all that automatically, and seems like that the only thing I can't handle is stack growing, any other aspect of behaviour is achievable one way or another with a very acceptable performance imho (yet I've noted some performance advices you mentioned and will work on them). Alexander. 2009/4/15 Jon Watte <jw...@gm...> asy...@gm... wrote: > When a task gets blocked by a dependency, worker thread can start > executing other tasks recursively. > So, in my world, I know what dependencies a task has before that task is started. Thus, I can prove that the task will not block, and thus no recursion (or additional stacks) necessary. The trade-off is that new dependencies can be introduced, but not synchronized, until after the current task invocation ends. The benefit is that scheduling can be a lot smarter, and the requirements for running a task are known and can be bounded (very important in embedded and otherwise limited environments). And, because tasks do not block while executing, this also means that there are no extra context switches -- each task gets switched in; that's the extent of switching. Thus, slightly less pressure on the context switcher (but optimizing switching is still worthwhile, up to a point). Sincerely, jw ---------------------------------------------------------------------------- -- This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ GDAlgorithms-list mailing list GDA...@li... https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list Archives: http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list -- Regards, Alexander Karnakov |
|
From: <asy...@gm...> - 2009-04-15 19:37:51
|
>what are you offering that TBB doesn’t? Task suspension mechanism. Afaik tbb doesn't provide that. Alexander. 2009/4/15 Gregory Junker <gj...@da...> > Honestly, Alexander, at this point you largely are re-implementing TBB. > Not that this by itself should give you pause, but (and maybe this is lost > on me over the past 180 or so posts), what are you offering that TBB > doesn’t? Given that a library such as asynclib will be considered by those > starting a new project, and not likely by those already using a particular > solution (whether commercial or in-house), what is(are) asynclib’s > differentiating factor(s)? > > > > On a side note, if you want to know what would make asynclib more appealing > in general, I’d recommend spending more time in the TBB forums, where plenty > of feedback on this topic is available. > > > > Regards > > Greg > > > > *From:* asy...@gm... [mailto:asy...@gm...] > *Sent:* Wednesday, April 15, 2009 11:09 AM > *To:* Game Development Algorithms > *Subject:* Re: [Algorithms] General purpose task parallel threading > approach > > > > >So, in my world, I know what dependencies a task has before that task is > >The trade-off is that new > >dependencies can be introduced, but not synchronized > That's an ok tradeoff for your project, but for someone else these > restrictions may seem too exotic because a problem might be more dynamic in > nature. > What I try to do is to provide enough possibilities for you guys to achieve > what you want with less amount of movements. What you say, Jon, could be > achieved if you'd simply call async::Wait not inside the task but outside > it. You could wait not directly on a task, but on a proxy synchronization > object. You could make a lot fewer synchronization points (relative to a > task) very far outside the tasks you work with, and resolve the dependency > for a bigger number of tasks, etc, etc. As I've with point 2 you can do the > same thing in many ways, where each way gives you slightly different > possibilites. Of course you'll probably want to wrap some repetive actions > you do into a set of another primitives or functions to ease programming for > your version of the multithreading model which you view as a best approach > to solve your specific problem. > > I have actually a question regarding point 1 (where you say a task needs to > be executed inside thread's stack and nowhere else). I think I know what you > want to say here: > a. - you need to have a growable stack > b. - you need cpu to access the same piece of memory from task to task. > > As I can see now the only thing I can't provide is 'a.'. I'll work on > 'b.' together with new optimized scheduler which will reuse the same > memory for tasks as much as possible to prevent cache thrashing on a > particular cpu. Point a. is an OS feature which I can't provide in usermode, > though I do provide a way to handle 3rd party library calls for which you > don't know stack requirements, and in the future I'll try provide some > automation for the code running in small stack environment. > > > The whole purpose of asynclib idea was to simplify programming. Anything > asynclib does you can do it manually, for example, a sync point in the > middle of a task can be achieved without asynclib by splitting the task into > 2 functions - one does work before sync point, another does it after, and > register 2nd part in some interlocked queue which will be traversed by some > other code. Task suspension can be achieved by saving data you need to > restore in some structure and restoring it later and so on. Can you squeze > out a bit more performance ? Of course you can, since you have all required > information you could an instruction here and there. > You can do all that, but for that you need to write code and spend > programmer's time on that, horrify the code by introducing a quadrillion of > callbacks, completion functions, etc. etc, where as I try to provide a good > enough solution which handles all that automatically, and seems like that > the only thing I can't handle is stack growing, any other aspect of > behaviour is achievable one way or another with a very acceptable > performance imho (yet I've noted some performance advices you mentioned and > will work on them). > > Alexander. > > > 2009/4/15 Jon Watte <jw...@gm...> > > asy...@gm... wrote: > > When a task gets blocked by a dependency, worker thread can start > > executing other tasks recursively. > > > > So, in my world, I know what dependencies a task has before that task is > started. Thus, I can prove that the task will not block, and thus no > recursion (or additional stacks) necessary. The trade-off is that new > dependencies can be introduced, but not synchronized, until after the > current task invocation ends. The benefit is that scheduling can be a > lot smarter, and the requirements for running a task are known and can > be bounded (very important in embedded and otherwise limited environments). > > And, because tasks do not block while executing, this also means that > there are no extra context switches -- each task gets switched in; > that's the extent of switching. Thus, slightly less pressure on the > context switcher (but optimizing switching is still worthwhile, up to a > point). > > Sincerely, > > jw > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > -- > Regards, > Alexander Karnakov > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |