Thread: Re: [Algorithms] General purpose task parallel threading approach (Page 2)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I personally just don't see what this offers over standard threads, it
seems that your "worker threads" provide no additional functionality
to standard OS level threads (with lower priority). And I figure that
if you are seeing significant speedups from OS threading at this level
of development in your code, It's much more likely that you are simply
missing something important and not immediatally important then the OS
libs being very poorly written.

> I composed a quick sample - 2 threads which ping pong each other via 2 events (1 core utilized in total)

You will find that benchmarks done in a single core environment
quickly loose any sort of resemblance to results in a mult-core
environment. And if Single core is all you are going for, a Comparison
to threads is probally inappropiate and cooperative threads is what
you are looking for; Win32 already provides an implementation
"fibers".

Indy

On Sat, Apr 4, 2009 at 1:44 AM,  <asy...@gm...> wrote:
>
>>Shouldn't you be redesigning your system to eliminate this wait for data?
>> Ie if you need some data it should be already >computed, ie the frame
>> before.
> That's the only option you have using standard approach to achieve a benefit
> from multicore architecture.
>
> What I am trying to say is that you really don't have to. Writing single
> threaded code is a lot easier, takes less time, code is more compact and all
> dependencies in my case are handled automatically (where in your case you
> have to explicitly code them). To achieve multicore scale all you need to do
> is to provide enough work to execute by worker threads while other tasks
> wait for the dependencies to complete.
>
> Do you have a concrete example which bothers you (maybe I could provide a
> better explanation of how it'll work on a concrete case) ?
>
> Regards,
> Alexander.
>
> 2009/4/4 Oscar Forth <os...@tr...>
>>
>> Am I being thick?  It looks to me like you are trying to write single
>> threaded code wih multiple threads.  Shouldn't you be redesigning your
>> system to eliminate this wait for data?  Ie if you need some data it should
>> be already computed, ie the frame before.
>>
>> Otherwise you are never going to get the best out of a multicore
>> architecture ...
>>
>> Just my, probably wrong, 2p
>> 2009/4/4 <asy...@gm...>
>>>
>>> Hi Jon,
>>>
>>> >you instead just suspend the current thread and
>>> >create a new thread in the thread pool, what would the difference be?
>>> The difference here is speed.
>>>
>>> I have an implementation which does the same but with threads (for
>>> debugging purposes - you can find it in the package as well), and it is
>>> significantly slower. While mine implementation still touches a semaphore on
>>> each task switch, it can push about a million task switches per second now.
>>> I actually didn't measure performance with threads, but will try doing that
>>> later.
>>> My target goal for a task switch cost is about 1-2 hundred of cycles,
>>> which is about 10-15 millions of task switches per second on a 3ghz cpu.
>>> You should also compare thread spawning cost (unless you preallocate all
>>> of them) and a memory overhead (for windows threads it is minimum 4k for
>>> stack size), but if you preallocate, you'd do it for the maximum stack size.
>>> Creation cost in my system is just a malloc (which you can also override
>>> with a faster allocator), and minimum allowed task is 32 bytes (on x86).
>>>
>>> The cost of synchronization primitives on windows is high imho (critical
>>> section is a 48 bytes in size + an attached event in the long run). My
>>> approach allows minimal size for a crit-section like primitive be 2 bits
>>> only (can be injected into pointer), simple Lock (non-recursive mutex) - 4
>>> bytes, 12 bytes for a Mutex (recursive mutex - complete analog of critical
>>> section) and no use of windows events or any other objects.
>>>
>>>
>>> Besides huge optimizations potential in speed and size, there is no
>>> difference with what you do, Jon.
>>>
>>> >Have you measured it?
>>> I'll come back with numbers later.
>>>
>>> Regards,
>>> Alexander.
>>>
>>>
>>>
>>> 2009/4/4 Jon Watte <jw...@gm...>
>>>>
>>>> asy...@gm... wrote:
>>>> > To fight blocking conditions I've made a task suspension mechanics,
>>>> > which allows a worker thread to suspend a task when it meets something
>>>> > it should wait for (whether it is a mutex or an on going disk read).
>>>> > After task is suspended, worker thread picks up another task from the
>>>> > task pool, and keeps on executing it. When blocking condition ends,
>>>> > task is resumed, and put back into a task pool.
>>>>
>>>> You are basically combining fibers (co-operative threads) with
>>>> pre-emptive threads. There really isn't that much cost difference
>>>> between a pre-emptive thread and a fiber, so you might as well just
>>>> spawn another pre-emptive thread, and suspend the current thread, rather
>>>> than switching fiber on the current thread. If you're using a
>>>> dynamically growing thread pool, your worst-case thread count (and thus
>>>> memory usage) will be no worse than your worst-case fiber count.
>>>>
>>>> >
>>>> >  - a unified way to wait on primitives (you can wait on all sort of
>>>> > primitives including tasks together)
>>>> >  - task canceling
>>>> >  - timeout on wait operations
>>>>
>>>> We do that in our fibers, too. However, it turns out that on most OS-es,
>>>> you can do that for OS primitives too. On Windows, pretty much anything
>>>> can go into WaitForMultipleObjects() (although there is a 64-object
>>>> limitation). On UNIX, as long as something is a file descriptor, you can
>>>> wait on it using select().
>>>>
>>>> So, if instead of "suspending" the current "task" and picking another
>>>> task within the worker thread -- which I presume to be exactly like
>>>> fiber switching -- you instead just suspend the current thread and
>>>> create a new thread in the thread pool, what would the difference be?
>>>> Have you measured it?
>>>>
>>>> Sincerely,
>>>>
>>>> jw
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> _______________________________________________
>>>> GDAlgorithms-list mailing list
>>>> GDA...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
>>>> Archives:
>>>>
>>>> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> GDAlgorithms-list mailing list
>>> GDA...@li...
>>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
>>> Archives:
>>> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> GDAlgorithms-list mailing list
>> GDA...@li...
>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
>> Archives:
>> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>