Re: [ZooLib-dev] Stupid Question?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Thursday, Feb 13, 2003, at 05:57 America/Mexico_City, Kyle Wilt 
wrote:

> Hello,
>  My name is Kyle Wilt. I've recently been introduced to zoolib thanks 
> to Mike Crawford and I have a question about the multithreading used. 
> I noticed that every standalone window (at least in WIN32) has it's 
> own thread. I was wondering what the reasoning was behind this. I 
> don't necessarily agree or disagree with this method, I'm just > 
> curious.

Sorry to take so long to respond. For some reason my zoolib-dev folder 
was not highlighted as having any unread messages until a week after 
you'd sent your message. And I wanted to write a decent answer, which 
accounts for the rest of the month and a bit.

Summary: It's not a stupid question at all. In order to have identical 
dynamic behavior on all platforms we need a single model of behavior. 
Because Window's MSG queue can only reliably be serviced in one style 
we have to create a separation between that queue and the generic model 
presented (indirectly) by ZWindow. That separation is achieved by a 
single non-blocking thread running ::GetMessage/::DispatchMessage which 
communicates with the individual windows using a pair of locks on each 
window, and having each window service a higher level queue that 
contains only 'cooked' information.

More detail and background
--------------------------
One thread per window is probably the most controversial design 
decision in ZooLib. In fact from the perspective of ZSubPane and its 
derivatives the window architecture is unchanged from the very earliest 
days. ZSubPane and all the UI stuff requires only that the outermost 
pane be able to return a pointer to a ZFakeWindow instance. A few 
months ago I checked in to the repository ZNSPlugin_FakeWindow which 
puts a ZFakeWindow interface over the Netscape plugin interface and 
allows ZooLib panes to live within a browser window. It's actually 
incomplete, as it's just a proof-of-concept thing, but is interesting 
nontheless.

In fact it used to be that ZWindow was just one of several derivatives 
of ZFakeWindow. Others put ZFakeWindow interfaces on Mac 'cdev' control 
panel plugins, HyperCard 'XCMD' plugins, and turned a MacApp TView into 
a pseudo desktop with nested windows (that code lives on, restructured, 
in ZWindoid). When it came time to port NetPhone to Windows I started, 
and I'm not sure why, by conditionalizing ZWindow for Mac/Windows. That 
got old pretty quickly, and convinced me that I needed not an 
inheritance relationship between platform-specific ZWindows but a 
generic lower-level windowing API used by ZWindow to fulfill its 
repsponsibilities as a ZFakeWindow. That's where ZOSWindow came from.

For a couple of years the ZWindow/ZOSWindow combo worked great. The 
main thread invoked a platform specific event loop 
(GetMessage/DispatchMessage on Windows, GetNextEvent on Mac), which in 
turn invoked methods on each ZOSWindow which were handled by the 
ZOSWindow or delegated to the window's owner, an instance of ZWindow.

The apps I was working on were very network oriented and it became 
increasingly hard to keep UI code from indirectly invoking blocking 
I/O. When blocking I/O blocked the entire UI would freeze, so much work 
was delegated to worker threads. But then the worker threads would need 
to manipulate or communicate with the UI, and it's just not possible to 
arbitrarily manipulate a window on Windows without messages being 
posted to the Windows MSG queue. The Windows MSG queue behaves more 
like a postable semaphore than a mutex in synchronization behavior, and 
there are a slew of un-detectable deadly embraces that can occur. Say 
you want to call something as simple as SetWindowPos from arbitrary 
code that may do so whilst owning a lock. If the servicing of any of 
the cascade of MSGs that SetWindowPos triggers requires ownership of 
the same lock then you're screwed.

The solution for Windows would be to _embrace_ the MSG queue, and have 
communication be mediated by the MSG queue. Any method invocation 
against the window would cause an MSG to be posted, and the window proc 
would unpack the MSG and actually do the work. This kind of 
architecture was harder to implement on MacOS, and would introduce 
abominable latencies if the MacOS event queue were required to be used, 
which is what we'd want if the runtime behavior was to be as similar as 
possible.

Supporting BeOS finally pushed me in the direction that ZooLib finally 
took and continues to take. The model is that ZOSWindow::GetLock and 
thus ZWindow::GetWindowLock return a ZMutexBase reference. If a thread 
acquires that lock then it is free to invoke _any_ method on the window 
or the window's contents. It can set its visibility, resize and 
reposition it, invalidate regions, get a ZDC and draw into it etc. For 
each window that's created a new thread is spawned which services a 
ZMessage queue. When a message is posted to the queue the thread wakes 
up and acquires the window's lock. If the lock's already held by 
another thread then the window thread blocks until it's released. So if 
you own the window lock you can do anything to the window. And if 
you're servicing a message you know that you own the lock and nothing 
else can do anything to the window in the interim.

Behind the scenes on Windows, MacOS and POSIX a single thread exists 
that services the real event queue, MSG queue or XEvent queue. That 
thread handles many low level events itself. Other events require 
changes to a ZOSWindow's data structures, and/or posting of a ZMessage 
to the ZOSWindow's queue. The event service thread never blocks 
completely for more time than it takes to acquire the 'structure' lock 
on a ZOSWindow and post a ZMessage or update the ZOSWindow's data 
structures. The ZOSWindow's structure lock is separate and considered 
'tighter' or lower level than the lock returned by GetLock, which is 
known as the 'dispatch' lock. The code enforces the restriction that 
either lock can be acquired at any time by any code, but if both are to 
be acquired then the structure lock is always acquired _after_ the 
dispatch lock, and no blocking operation takes place whilst the 
structure lock is owned.

There still remains the issue of deadlock. At least it is now 
detectable. Imagine the following. Some UI code, invoked by a Window's 
message dispatch, tries to lock a data structure in order to safely 
transcribe its contents to a pane for display. A worker thread has 
locked the data structure, made changes to it, and wants to notify any 
dependents (including the UI) that it has changed. If that notification 
requires an invalidation of a window region then the notification code 
will need to lock the window before calling invalidate. Bingo, 
deadlock. Which we can actually detect (if 
ZCONFIG_Thread_DeadlockDetect is true, by default when ZCONFIG_Debug is 
 >= 2), whereas a deadlock because the MSG servicing WindowProc will 
never return to service the MSG queue is not generally detectable.

The solution here is to use the ZWindow's ZMessage queue to post 
notifications to UI elements. I've spent a lot of time trying to find 
an architecture that is as simple as what we've all been used in the 
single threaded world, and have come to the reluctant conclusion that 
there just isn't one. Ultimately, multiple threads of control have 
either to be synchronized by a carefully designed locking regime, or 
decoupled by use of asynchronous communications like a message queue.

So although ZooLib doesn't solve the problem it does provide the 
building blocks to sidestep it. And this is the controversy. If 
interaction between arbitrary threads of control and the UI must be 
mediated by a message queue anyway why not simply have a single thread 
for the UI? And it's a reasonable argument. However, my own experience 
is that we don't usually have arbitrary threads of control. Generally 
there are different subsystems which are largely independent of one 
another. In those specific subsystems different styles of coordination 
are used. In some cases worker threads only ever talk to windows, 
windows never need to talk to worker threads. In other cases things are 
more tangled. And having an architecture that is as general as ZooLib's 
allows one to be quite cavalier on an individual window basis, whilst 
providing a consistent reliable mechanism when windows become more 
coupled to one another and to global state and worker threads.

In closing, I finally completed the ZMessenger mechanism some while 
ago. Any object which is a ZMessageReceiver can have its GetMessenger 
method called. The ZMessageLooper that the ZMessageReceiver is attached 
to must be locked at the time GetMessenger is called. But from that 
point onwards the ZMessenger object, which has value semantics just 
like a ZDCRgn, ZDCPixmap or simple Plain Old Data, can have its 
PostMessage method called. ZMessenger::PostMessage takes a ZMessage 
reference and returns a bool. If the return is true then the message in 
question is known to have been posted to the appropriate looper's queue 
and will ultimately be delivered to the ZMessageReceiver. If the return 
is false then the ZMessageReceiver was disposed before the message 
could be posted, perhaps a long time before or perhaps whilst 
ZMessenger::PostMessage was doing its work.

This is the missing piece to allow arbitrary code to reliably post 
messages to arbitrary destinations with no locking issues. The sample 
application paintertest should use this, but doesn't -- it's quite old 
and relies on ZInterest. I deprecated ZInterest a long time ago 
because, although it's a nice facility, it uses regular callbacks. And 
as we've seen trying to pretend we're in a single-threaded world when 
we're not is just not viable. I'm still hoping to get the time to 
rework paintertest as its the largest and most comprehensive 
publicly-visible app built on ZooLib. It's more likely that one of my 
other side-projects will take on the mantle of a best-practice sample.

Regards,
A+
-- 
Andrew Green                           mailto:ag...@em...
Electric Magic Co.           Vox/Fax: +1 (408) 907 2101