Re: [Algorithms] General purpose task parallel threading approach

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>You will find that benchmarks done in a single core environment
I just wanted to measure raw overhead of thread communication in a system
Jon Watte described.

>Win32 already provides an implementation
"fibers".
Fibers is just a small fraction of functionality, i.e. a fiber doesn't
automatically switch to another available fiber if it enters a blocking
critical section for example, you need a framework built on top of fibers
for that to happen.

Regards,
Alexander.

2009/4/4 Nicholas "Indy" Ray <ar...@gm...>

> I personally just don't see what this offers over standard threads, it
> seems that your "worker threads" provide no additional functionality
> to standard OS level threads (with lower priority). And I figure that
> if you are seeing significant speedups from OS threading at this level
> of development in your code, It's much more likely that you are simply
> missing something important and not immediatally important then the OS
> libs being very poorly written.
>
> > I composed a quick sample - 2 threads which ping pong each other via 2
> events (1 core utilized in total)
>
> You will find that benchmarks done in a single core environment
> quickly loose any sort of resemblance to results in a mult-core
> environment. And if Single core is all you are going for, a Comparison
> to threads is probally inappropiate and cooperative threads is what
> you are looking for; Win32 already provides an implementation
> "fibers".
>
> Indy
>
> On Sat, Apr 4, 2009 at 1:44 AM,  <asy...@gm...> wrote:
> >
> >>Shouldn't you be redesigning your system to eliminate this wait for data?
> >> Ie if you need some data it should be already >computed, ie the frame
> >> before.
> > That's the only option you have using standard approach to achieve a
> benefit
> > from multicore architecture.
> >
> > What I am trying to say is that you really don't have to. Writing single
> > threaded code is a lot easier, takes less time, code is more compact and
> all
> > dependencies in my case are handled automatically (where in your case you
> > have to explicitly code them). To achieve multicore scale all you need to
> do
> > is to provide enough work to execute by worker threads while other tasks
> > wait for the dependencies to complete.
> >
> > Do you have a concrete example which bothers you (maybe I could provide a
> > better explanation of how it'll work on a concrete case) ?
> >
> > Regards,
> > Alexander.
> >
> > 2009/4/4 Oscar Forth <os...@tr...>
> >>
> >> Am I being thick?  It looks to me like you are trying to write single
> >> threaded code wih multiple threads.  Shouldn't you be redesigning your
> >> system to eliminate this wait for data?  Ie if you need some data it
> should
> >> be already computed, ie the frame before.
> >>
> >> Otherwise you are never going to get the best out of a multicore
> >> architecture ...
> >>
> >> Just my, probably wrong, 2p
> >> 2009/4/4 <asy...@gm...>
> >>>
> >>> Hi Jon,
> >>>
> >>> >you instead just suspend the current thread and
> >>> >create a new thread in the thread pool, what would the difference be?
> >>> The difference here is speed.
> >>>
> >>> I have an implementation which does the same but with threads (for
> >>> debugging purposes - you can find it in the package as well), and it is
> >>> significantly slower. While mine implementation still touches a
> semaphore on
> >>> each task switch, it can push about a million task switches per second
> now.
> >>> I actually didn't measure performance with threads, but will try doing
> that
> >>> later.
> >>> My target goal for a task switch cost is about 1-2 hundred of cycles,
> >>> which is about 10-15 millions of task switches per second on a 3ghz
> cpu.
> >>> You should also compare thread spawning cost (unless you preallocate
> all
> >>> of them) and a memory overhead (for windows threads it is minimum 4k
> for
> >>> stack size), but if you preallocate, you'd do it for the maximum stack
> size.
> >>> Creation cost in my system is just a malloc (which you can also
> override
> >>> with a faster allocator), and minimum allowed task is 32 bytes (on
> x86).
> >>>
> >>> The cost of synchronization primitives on windows is high imho
> (critical
> >>> section is a 48 bytes in size + an attached event in the long run). My
> >>> approach allows minimal size for a crit-section like primitive be 2
> bits
> >>> only (can be injected into pointer), simple Lock (non-recursive mutex)
> - 4
> >>> bytes, 12 bytes for a Mutex (recursive mutex - complete analog of
> critical
> >>> section) and no use of windows events or any other objects.
> >>>
> >>>
> >>> Besides huge optimizations potential in speed and size, there is no
> >>> difference with what you do, Jon.
> >>>
> >>> >Have you measured it?
> >>> I'll come back with numbers later.
> >>>
> >>> Regards,
> >>> Alexander.
> >>>
> >>>
> >>>
> >>> 2009/4/4 Jon Watte <jw...@gm...>
> >>>>
> >>>> asy...@gm... wrote:
> >>>> > To fight blocking conditions I've made a task suspension mechanics,
> >>>> > which allows a worker thread to suspend a task when it meets
> something
> >>>> > it should wait for (whether it is a mutex or an on going disk read).
> >>>> > After task is suspended, worker thread picks up another task from
> the
> >>>> > task pool, and keeps on executing it. When blocking condition ends,
> >>>> > task is resumed, and put back into a task pool.
> >>>>
> >>>> You are basically combining fibers (co-operative threads) with
> >>>> pre-emptive threads. There really isn't that much cost difference
> >>>> between a pre-emptive thread and a fiber, so you might as well just
> >>>> spawn another pre-emptive thread, and suspend the current thread,
> rather
> >>>> than switching fiber on the current thread. If you're using a
> >>>> dynamically growing thread pool, your worst-case thread count (and
> thus
> >>>> memory usage) will be no worse than your worst-case fiber count.
> >>>>
> >>>> >
> >>>> >  - a unified way to wait on primitives (you can wait on all sort of
> >>>> > primitives including tasks together)
> >>>> >  - task canceling
> >>>> >  - timeout on wait operations
> >>>>
> >>>> We do that in our fibers, too. However, it turns out that on most
> OS-es,
> >>>> you can do that for OS primitives too. On Windows, pretty much
> anything
> >>>> can go into WaitForMultipleObjects() (although there is a 64-object
> >>>> limitation). On UNIX, as long as something is a file descriptor, you
> can
> >>>> wait on it using select().
> >>>>
> >>>> So, if instead of "suspending" the current "task" and picking another
> >>>> task within the worker thread -- which I presume to be exactly like
> >>>> fiber switching -- you instead just suspend the current thread and
> >>>> create a new thread in the thread pool, what would the difference be?
> >>>> Have you measured it?
> >>>>
> >>>> Sincerely,
> >>>>
> >>>> jw
> >>>>
> >>>>
> >>>>
> >>>>
> ------------------------------------------------------------------------------
> >>>> _______________________________________________
> >>>> GDAlgorithms-list mailing list
> >>>> GDA...@li...
> >>>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> >>>> Archives:
> >>>>
> >>>>
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> GDAlgorithms-list mailing list
> >>> GDA...@li...
> >>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> >>> Archives:
> >>>
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >>
> >> _______________________________________________
> >> GDAlgorithms-list mailing list
> >> GDA...@li...
> >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> >> Archives:
> >>
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
> >
> >
> >
> ------------------------------------------------------------------------------
> >
> > _______________________________________________
> > GDAlgorithms-list mailing list
> > GDA...@li...
> > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> > Archives:
> >
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
> >
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>

-- 
Regards,
Alexander Karnakov