Re: [Algorithms] General purpose task parallel threading approach
Brought to you by:
vexxed72
|
From: <asy...@gm...> - 2009-04-04 20:42:50
|
Thanks for your input Jon. The sample I composed is really a cache friendly one. This is the best situation for cooperative tasks, that's true, but it should also be the best situation for win threads as well (since the same data is touched all the time). You are right that this benchmark is far away from a real life scenario, but there should be a way to estimate the speed. How about comparing the worst case scenario for cooperative task approach with the best case one for winthreads ? Next benchmark I just tried which should be the worst situation for cooperative tasks - every task not in the cache. I created 2 million tasks (each of 512 bytes) and linked them all with an unique event (i.e. next task waits for it's event which previous tasks signales). Total size of memory traversed is 1 gigabyte, which obviously doesn't fit into cpu cache. 10 traversals give 20 million task switches in total, and that takes 10seconds to complete, which is still 8 times faster than the best case for winthreads. If you can think of a way how to slow it down more, let me know. I also would like to ask you what am I missing currently you would like to have in a task state ? You've mentioned tls for tasks, that's something I'll add a bit later, and it is not going to cost any extra cycles. In general I follow approach of 0 cost for a feature if you don't use it (which is achieved with assembly). For example - if you don't use exceptions, I can avoid saving/restoring exception pointer and so on. I didn't get what problems you've met with task cancelling, I just return special code from the blocking function (as you did as well), whether to throw an exception or not in this case is up to the user. Could you also point me to any other potential problems you've experienced with such approaches ? >I think it's important to measure on actual, real-world >workloads to make an informed decision. Well, I can only provide benchmarks from my side, and I try to do them as fair as I can. Alexander. 2009/4/4 Jon Watte <jw...@gm...> > asy...@gm... wrote: > > I believe that from cpu point of view kernel scheduler and user-space > > scheduler are in the same situation here in regard to cpu cache usage. > > Yes, you need to keep your data close to cache, avoid task/thread > > migration and avoid cache conflicts. I don't see why a user space > > version of task scheduler can't be as good as kernel side one, since > > the kernel and userspace scheduler have the same amount of information > > in order to take a best decision, or am I wrong here ? > > And as I mentioned - in user space you need to save/restore muuch > > smaller state, so you have initial speed benefit here. > > What I'm saying is that, in the specific benchmark you posted, you > heavily skewed the measurement in favor of user-space cooperative > threads (fibers, coroutines, whatever you want to call them). The reason > is that the cache footprint of that particular, single-threaded, > small-task, user-space switch is tiny. > When using the kernel to switch threads, more code is executed, and more > data is touched. For example, threads provide TLS, which fibers do not. > Threads provide a whole host of other specific state which fibers don't. > In addition, entering the kernel has some overhead for the system call > and parameter validation. Clearly, the overhead when switching real > threads in this case is higher. > > However, when you start writing a real application, where you have a > large amount of different tasks, where the tasks are not as small or as > tightly coupled, and you start seeing tasks get moved between cores, the > different between a fiber scheduler and the kernel thread scheduler > becomes smaller. That's why I said I think that, with higher load, the > thread scheduler will not degrade at the same rate as the fiber scheduler. > > Of course the fiber scheduler (if implemented competently, at least) > will be faster than the thread scheduler, because it does less (no > kernel entry, no TLS or other thread context, no pre-emption, etc). The > question is: once you've added back all the things you end up needing > for a real application to your fiber system (and TLS context switch may > be part of that), how much faster is it, and what are you giving up? > > For example, we've had a user-space fiber system for our server stuff > for many years, not unlike what you describe -- we have fiber-specific > locks, waiting for I/O schedules another fiber, etc. We've found that > the fiber switching brings with it its own problems, especially in the > cancellation case. If you want to stop the operation that is going on > (say, the user canceled, or the connection was broken while you were > waiting for I/O), the fiber "wait" operation has to return an error, and > it has to wind its way out to the base of the request while backing out > state. In later incarnations, we use exception handling for this; the > "wait" operation for fibers (which can wait on I/O, synchronization, > etc) will throw an exception when the fiber/operation is canceled. > > I guess I'm just a little jaded, having seen a large number of > user-space thread packages over the last 25 years (including the Mac > threads, Java 1 "green" threads, Win32 fibers, UNIX fibers on longjmp, > etc), and I think the benefits and cost trade-off is not as clear as you > seem to want them to be. Yes, there are cases where it is the right > solution, but there also are cases where it isn't. If you want to use a > particular measurement (such as context switches per second) as your > criteria, then I think it's important to measure on actual, real-world > workloads to make an informed decision. > > Sincerely, > > jw > > > > ------------------------------------------------------------------------------ > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Regards, Alexander Karnakov |