Re: [Algorithms] General purpose task parallel threading approach
Brought to you by:
vexxed72
|
From: <asy...@gm...> - 2009-04-04 10:11:51
|
>You will find that benchmarks done in a single core environment I just wanted to measure raw overhead of thread communication in a system Jon Watte described. >Win32 already provides an implementation "fibers". Fibers is just a small fraction of functionality, i.e. a fiber doesn't automatically switch to another available fiber if it enters a blocking critical section for example, you need a framework built on top of fibers for that to happen. Regards, Alexander. 2009/4/4 Nicholas "Indy" Ray <ar...@gm...> > I personally just don't see what this offers over standard threads, it > seems that your "worker threads" provide no additional functionality > to standard OS level threads (with lower priority). And I figure that > if you are seeing significant speedups from OS threading at this level > of development in your code, It's much more likely that you are simply > missing something important and not immediatally important then the OS > libs being very poorly written. > > > I composed a quick sample - 2 threads which ping pong each other via 2 > events (1 core utilized in total) > > You will find that benchmarks done in a single core environment > quickly loose any sort of resemblance to results in a mult-core > environment. And if Single core is all you are going for, a Comparison > to threads is probally inappropiate and cooperative threads is what > you are looking for; Win32 already provides an implementation > "fibers". > > Indy > > On Sat, Apr 4, 2009 at 1:44 AM, <asy...@gm...> wrote: > > > >>Shouldn't you be redesigning your system to eliminate this wait for data? > >> Ie if you need some data it should be already >computed, ie the frame > >> before. > > That's the only option you have using standard approach to achieve a > benefit > > from multicore architecture. > > > > What I am trying to say is that you really don't have to. Writing single > > threaded code is a lot easier, takes less time, code is more compact and > all > > dependencies in my case are handled automatically (where in your case you > > have to explicitly code them). To achieve multicore scale all you need to > do > > is to provide enough work to execute by worker threads while other tasks > > wait for the dependencies to complete. > > > > Do you have a concrete example which bothers you (maybe I could provide a > > better explanation of how it'll work on a concrete case) ? > > > > Regards, > > Alexander. > > > > 2009/4/4 Oscar Forth <os...@tr...> > >> > >> Am I being thick? It looks to me like you are trying to write single > >> threaded code wih multiple threads. Shouldn't you be redesigning your > >> system to eliminate this wait for data? Ie if you need some data it > should > >> be already computed, ie the frame before. > >> > >> Otherwise you are never going to get the best out of a multicore > >> architecture ... > >> > >> Just my, probably wrong, 2p > >> 2009/4/4 <asy...@gm...> > >>> > >>> Hi Jon, > >>> > >>> >you instead just suspend the current thread and > >>> >create a new thread in the thread pool, what would the difference be? > >>> The difference here is speed. > >>> > >>> I have an implementation which does the same but with threads (for > >>> debugging purposes - you can find it in the package as well), and it is > >>> significantly slower. While mine implementation still touches a > semaphore on > >>> each task switch, it can push about a million task switches per second > now. > >>> I actually didn't measure performance with threads, but will try doing > that > >>> later. > >>> My target goal for a task switch cost is about 1-2 hundred of cycles, > >>> which is about 10-15 millions of task switches per second on a 3ghz > cpu. > >>> You should also compare thread spawning cost (unless you preallocate > all > >>> of them) and a memory overhead (for windows threads it is minimum 4k > for > >>> stack size), but if you preallocate, you'd do it for the maximum stack > size. > >>> Creation cost in my system is just a malloc (which you can also > override > >>> with a faster allocator), and minimum allowed task is 32 bytes (on > x86). > >>> > >>> The cost of synchronization primitives on windows is high imho > (critical > >>> section is a 48 bytes in size + an attached event in the long run). My > >>> approach allows minimal size for a crit-section like primitive be 2 > bits > >>> only (can be injected into pointer), simple Lock (non-recursive mutex) > - 4 > >>> bytes, 12 bytes for a Mutex (recursive mutex - complete analog of > critical > >>> section) and no use of windows events or any other objects. > >>> > >>> > >>> Besides huge optimizations potential in speed and size, there is no > >>> difference with what you do, Jon. > >>> > >>> >Have you measured it? > >>> I'll come back with numbers later. > >>> > >>> Regards, > >>> Alexander. > >>> > >>> > >>> > >>> 2009/4/4 Jon Watte <jw...@gm...> > >>>> > >>>> asy...@gm... wrote: > >>>> > To fight blocking conditions I've made a task suspension mechanics, > >>>> > which allows a worker thread to suspend a task when it meets > something > >>>> > it should wait for (whether it is a mutex or an on going disk read). > >>>> > After task is suspended, worker thread picks up another task from > the > >>>> > task pool, and keeps on executing it. When blocking condition ends, > >>>> > task is resumed, and put back into a task pool. > >>>> > >>>> You are basically combining fibers (co-operative threads) with > >>>> pre-emptive threads. There really isn't that much cost difference > >>>> between a pre-emptive thread and a fiber, so you might as well just > >>>> spawn another pre-emptive thread, and suspend the current thread, > rather > >>>> than switching fiber on the current thread. If you're using a > >>>> dynamically growing thread pool, your worst-case thread count (and > thus > >>>> memory usage) will be no worse than your worst-case fiber count. > >>>> > >>>> > > >>>> > - a unified way to wait on primitives (you can wait on all sort of > >>>> > primitives including tasks together) > >>>> > - task canceling > >>>> > - timeout on wait operations > >>>> > >>>> We do that in our fibers, too. However, it turns out that on most > OS-es, > >>>> you can do that for OS primitives too. On Windows, pretty much > anything > >>>> can go into WaitForMultipleObjects() (although there is a 64-object > >>>> limitation). On UNIX, as long as something is a file descriptor, you > can > >>>> wait on it using select(). > >>>> > >>>> So, if instead of "suspending" the current "task" and picking another > >>>> task within the worker thread -- which I presume to be exactly like > >>>> fiber switching -- you instead just suspend the current thread and > >>>> create a new thread in the thread pool, what would the difference be? > >>>> Have you measured it? > >>>> > >>>> Sincerely, > >>>> > >>>> jw > >>>> > >>>> > >>>> > >>>> > ------------------------------------------------------------------------------ > >>>> _______________________________________________ > >>>> GDAlgorithms-list mailing list > >>>> GDA...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > >>>> Archives: > >>>> > >>>> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > >>> > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> > >>> _______________________________________________ > >>> GDAlgorithms-list mailing list > >>> GDA...@li... > >>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > >>> Archives: > >>> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> > >> _______________________________________________ > >> GDAlgorithms-list mailing list > >> GDA...@li... > >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > >> Archives: > >> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > > GDAlgorithms-list mailing list > > GDA...@li... > > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > > Archives: > > > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > > ------------------------------------------------------------------------------ > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |