From: Andrew G. <ag...@em...> - 2003-03-20 16:55:34
|
On Thursday, Feb 13, 2003, at 05:57 America/Mexico_City, Kyle Wilt wrote: > Hello, > My name is Kyle Wilt. I've recently been introduced to zoolib thanks > to Mike Crawford and I have a question about the multithreading used. > I noticed that every standalone window (at least in WIN32) has it's > own thread. I was wondering what the reasoning was behind this. I > don't necessarily agree or disagree with this method, I'm just > > curious. Sorry to take so long to respond. For some reason my zoolib-dev folder was not highlighted as having any unread messages until a week after you'd sent your message. And I wanted to write a decent answer, which accounts for the rest of the month and a bit. Summary: It's not a stupid question at all. In order to have identical dynamic behavior on all platforms we need a single model of behavior. Because Window's MSG queue can only reliably be serviced in one style we have to create a separation between that queue and the generic model presented (indirectly) by ZWindow. That separation is achieved by a single non-blocking thread running ::GetMessage/::DispatchMessage which communicates with the individual windows using a pair of locks on each window, and having each window service a higher level queue that contains only 'cooked' information. More detail and background -------------------------- One thread per window is probably the most controversial design decision in ZooLib. In fact from the perspective of ZSubPane and its derivatives the window architecture is unchanged from the very earliest days. ZSubPane and all the UI stuff requires only that the outermost pane be able to return a pointer to a ZFakeWindow instance. A few months ago I checked in to the repository ZNSPlugin_FakeWindow which puts a ZFakeWindow interface over the Netscape plugin interface and allows ZooLib panes to live within a browser window. It's actually incomplete, as it's just a proof-of-concept thing, but is interesting nontheless. In fact it used to be that ZWindow was just one of several derivatives of ZFakeWindow. Others put ZFakeWindow interfaces on Mac 'cdev' control panel plugins, HyperCard 'XCMD' plugins, and turned a MacApp TView into a pseudo desktop with nested windows (that code lives on, restructured, in ZWindoid). When it came time to port NetPhone to Windows I started, and I'm not sure why, by conditionalizing ZWindow for Mac/Windows. That got old pretty quickly, and convinced me that I needed not an inheritance relationship between platform-specific ZWindows but a generic lower-level windowing API used by ZWindow to fulfill its repsponsibilities as a ZFakeWindow. That's where ZOSWindow came from. For a couple of years the ZWindow/ZOSWindow combo worked great. The main thread invoked a platform specific event loop (GetMessage/DispatchMessage on Windows, GetNextEvent on Mac), which in turn invoked methods on each ZOSWindow which were handled by the ZOSWindow or delegated to the window's owner, an instance of ZWindow. The apps I was working on were very network oriented and it became increasingly hard to keep UI code from indirectly invoking blocking I/O. When blocking I/O blocked the entire UI would freeze, so much work was delegated to worker threads. But then the worker threads would need to manipulate or communicate with the UI, and it's just not possible to arbitrarily manipulate a window on Windows without messages being posted to the Windows MSG queue. The Windows MSG queue behaves more like a postable semaphore than a mutex in synchronization behavior, and there are a slew of un-detectable deadly embraces that can occur. Say you want to call something as simple as SetWindowPos from arbitrary code that may do so whilst owning a lock. If the servicing of any of the cascade of MSGs that SetWindowPos triggers requires ownership of the same lock then you're screwed. The solution for Windows would be to _embrace_ the MSG queue, and have communication be mediated by the MSG queue. Any method invocation against the window would cause an MSG to be posted, and the window proc would unpack the MSG and actually do the work. This kind of architecture was harder to implement on MacOS, and would introduce abominable latencies if the MacOS event queue were required to be used, which is what we'd want if the runtime behavior was to be as similar as possible. Supporting BeOS finally pushed me in the direction that ZooLib finally took and continues to take. The model is that ZOSWindow::GetLock and thus ZWindow::GetWindowLock return a ZMutexBase reference. If a thread acquires that lock then it is free to invoke _any_ method on the window or the window's contents. It can set its visibility, resize and reposition it, invalidate regions, get a ZDC and draw into it etc. For each window that's created a new thread is spawned which services a ZMessage queue. When a message is posted to the queue the thread wakes up and acquires the window's lock. If the lock's already held by another thread then the window thread blocks until it's released. So if you own the window lock you can do anything to the window. And if you're servicing a message you know that you own the lock and nothing else can do anything to the window in the interim. Behind the scenes on Windows, MacOS and POSIX a single thread exists that services the real event queue, MSG queue or XEvent queue. That thread handles many low level events itself. Other events require changes to a ZOSWindow's data structures, and/or posting of a ZMessage to the ZOSWindow's queue. The event service thread never blocks completely for more time than it takes to acquire the 'structure' lock on a ZOSWindow and post a ZMessage or update the ZOSWindow's data structures. The ZOSWindow's structure lock is separate and considered 'tighter' or lower level than the lock returned by GetLock, which is known as the 'dispatch' lock. The code enforces the restriction that either lock can be acquired at any time by any code, but if both are to be acquired then the structure lock is always acquired _after_ the dispatch lock, and no blocking operation takes place whilst the structure lock is owned. There still remains the issue of deadlock. At least it is now detectable. Imagine the following. Some UI code, invoked by a Window's message dispatch, tries to lock a data structure in order to safely transcribe its contents to a pane for display. A worker thread has locked the data structure, made changes to it, and wants to notify any dependents (including the UI) that it has changed. If that notification requires an invalidation of a window region then the notification code will need to lock the window before calling invalidate. Bingo, deadlock. Which we can actually detect (if ZCONFIG_Thread_DeadlockDetect is true, by default when ZCONFIG_Debug is >= 2), whereas a deadlock because the MSG servicing WindowProc will never return to service the MSG queue is not generally detectable. The solution here is to use the ZWindow's ZMessage queue to post notifications to UI elements. I've spent a lot of time trying to find an architecture that is as simple as what we've all been used in the single threaded world, and have come to the reluctant conclusion that there just isn't one. Ultimately, multiple threads of control have either to be synchronized by a carefully designed locking regime, or decoupled by use of asynchronous communications like a message queue. So although ZooLib doesn't solve the problem it does provide the building blocks to sidestep it. And this is the controversy. If interaction between arbitrary threads of control and the UI must be mediated by a message queue anyway why not simply have a single thread for the UI? And it's a reasonable argument. However, my own experience is that we don't usually have arbitrary threads of control. Generally there are different subsystems which are largely independent of one another. In those specific subsystems different styles of coordination are used. In some cases worker threads only ever talk to windows, windows never need to talk to worker threads. In other cases things are more tangled. And having an architecture that is as general as ZooLib's allows one to be quite cavalier on an individual window basis, whilst providing a consistent reliable mechanism when windows become more coupled to one another and to global state and worker threads. In closing, I finally completed the ZMessenger mechanism some while ago. Any object which is a ZMessageReceiver can have its GetMessenger method called. The ZMessageLooper that the ZMessageReceiver is attached to must be locked at the time GetMessenger is called. But from that point onwards the ZMessenger object, which has value semantics just like a ZDCRgn, ZDCPixmap or simple Plain Old Data, can have its PostMessage method called. ZMessenger::PostMessage takes a ZMessage reference and returns a bool. If the return is true then the message in question is known to have been posted to the appropriate looper's queue and will ultimately be delivered to the ZMessageReceiver. If the return is false then the ZMessageReceiver was disposed before the message could be posted, perhaps a long time before or perhaps whilst ZMessenger::PostMessage was doing its work. This is the missing piece to allow arbitrary code to reliably post messages to arbitrary destinations with no locking issues. The sample application paintertest should use this, but doesn't -- it's quite old and relies on ZInterest. I deprecated ZInterest a long time ago because, although it's a nice facility, it uses regular callbacks. And as we've seen trying to pretend we're in a single-threaded world when we're not is just not viable. I'm still hoping to get the time to rework paintertest as its the largest and most comprehensive publicly-visible app built on ZooLib. It's more likely that one of my other side-projects will take on the mantle of a best-practice sample. Regards, A+ -- Andrew Green mailto:ag...@em... Electric Magic Co. Vox/Fax: +1 (408) 907 2101 |