Bug#1391053: spurious events generated on Windows after completion of tk_getOpenFile, tk_chooseColor and similar modal dialogs. This patch, redone against CVS head, provides a real solution for this long-standing problem, instead of the current pseudo-solution that hurts but not helps (see win/tkWinDialog.c:EatSpuriousMessageBugFix).
Steps to reproduce: (1) bind . <1> "puts Click"; bind . <3> tk_getOpenFile; maximize main window; right-click, then select a file. When the file dialog is closed on click-that-selects-a-file, you'll see the Click reaching the main window. Similar thing for closed Tk non-main toplevel windows by double-clicking on system menu icon: if another tk window happens to be under the click, it receives an event.
Tk assumes that if a mouse button was released and is now pressed, then it must generate an event (Tk_UpdatePointer). This patch adds a special flag for Tk_UpdatePointer to take exception from this assumption. This modification proven to be useful for many other things during 4 years I maintain it for my projects.
There is another obscure manifestation of a similar problem, related to non-client clicks (like WM_NCLBUTTONDOWN): tk really shouldn't generate ButtonDown on these events. This patch adds WM_NC* handling to the toplevel window procedure, using special flag for Tk_UpdatePointer to prevent these spurious events too, and another flag to prevent cursor shape changes for non-client areas).
Unfortunately there are another things in this patch (separating them is not easy): support for wm attr -contexthelp (uses virtual <<Help>> event, no need to worry about event mask bits exhaustion); support for wm attr -shadow (recreates wrapper window with CS_DROPSHADOW).
To complicate things even more, this patch includes tkUnixRFont update for -underline and -overstrike support in Xft backed. This part can be separated :)
I would like to see something done in this direction: while we can live without improvements, though they're discussed and accepted long ago, the bugs addressed by this patch are really annoying. E.g. a user does Menu->File->Open, and his file-selecting click goes directly to toolbar->New file :(
Should I simply commit this whole thing myself? If it's unacceptable, can anyone provide the directions for refactoring it? Maybe I should wait 4 more years..
Multi-purpose patch:fixes bug#1391053, adds wm attrs., etc.
Megapatches cause indigestion. Raising prio so that it's more likely to be reviewed soon. (No time to look now...)
P.S. Maybe NOBUTTONEVENTS_MASK added by this patch should be made public (not tkInt.h). We need some way for extensions to implement modal dialogs on Windows correctly. E.g. TDBC datasource selection dialog should call Tcl_SetServiceMode before/after dialog invocation, and it will need a solution similar to EatSpuriousMessageBugFix(). At least, this patch makes correct implementation possible at all.
You should not mix functionality in a single patch. This patch needs to be broken into individual pieces that fix one thing at a time. That way, if/when later we try to identify when some new issue was created we can isolate the specific change. It is also much harder to review such multi-function patches.
I know from previous testing that the contexthelp patch had problems. Dropshadow is a different thing too.
You need to use something like git or quilt or learn to apply and revert patches. before I started to use git I used to maintain a number of patches and apply them (patch -p0 < feature.patch), work on them, regenerate the patch (cvs diff > feature.patch) and revert it (patch -p0 -R < feature.patch) to switch to some other task.
Another option is to use multiple Tk sandbox checkouts to isolate the work.
These days I use git checkout -b feature-patch; hack; git commit and various combinations of rebase/cherry-pick/merge to tidy up the patch if necessary.
I have added an attempt to break this into functional pieces using Git. I applied this patch to my working HEAD tree (called master in git) then used 'git gui' to stage just the hunks or lines for each area to create commits that just do one thing.
The unix Xft underline/strikeout issue has already been resolved in 8.6 so I discarded that along with some spurious edits (unused variables, white-space only).
The first patch is still doing two things - its changing the windows EatSpuriousMessages and making changes to the event mask stuff. This should be broken into two.
In the contexthelp part, I'm a bit uncertain about the use of a static to hold the help cursor id. I dont know if that will be ok with threads or using a safeTk interp.
In the dropshadow patch, there is no explanation why the window class style is being changed from CS_VREDRAW|CS_HREDRAW to no-flags. This needs explaining.
I'm not sure if the hunk in WmProc is supposed to be part of this patch or the first patch either.
first broken-out patch
second broken-out patch
third broken-out patch
I've done a bit of testing on this using the git patch series below.
The context help seems to work mostly ok but leaks a ButtonRelease event to the target widget.
The current 8.5.8 using EatSpuriousMessages does not leak any ButtonRelease events when double-clicking a file icon in the tk_getOpenFile dialog over a button widget. However when we use this patch series which replaces the EatSpuriousMessages we see ButtonRelease events leaking from this again.
I have attached my test script which shows this up. While the leaked buttonRelease events are not producing a click on the buttons, this is a regression from the current state of affairs.
Sample test program showing leaked events
First, many thanks to patthoyts for reviewing those patches.
Sorry, but I'm going to defend my variant of Eat Spurious Message Bug Fix against the one currently present in tk 8.5/ 8.6 heads. Point one: did you try to hold the button for more than 250 msecs while testing current tk heads? It leaks more than a ButtonRelease in this case: it leaks an entire click, i.e. ButtonPress/ButtonRelease pair.
In the context of my alternative solution, ButtonRelease is not "leaked" -- it's just _what happens_ when the button is pressed over a dialog window, then the dialog disappears, and then the button is released. I've yet to see any GUI misbehavior resulting from this event (that is not spurious, but realistic; it's the _click_ (press+release) that was spurious).
If you prefer the way of handling things provided by the current solution, it's a must to prevent at least WM_PAINTs from being eaten or delayer. This aspect of the old solution is just plain ugly. The application _hangs_ for 250 ms., it's very noticeable, especially on fast computers where Tk normally works fast.
I hate to say that, but the whole Tk's machinery of mouse event translations is broken by design. Tk assumes that the difference in button states should be translated into press/release events. I believe there was a reason to avoid the obvious alternative, where WM_*BUTTONDOWN would be translated into a press and WM_*BUTTONUP into a release (similar thing is possible in X11, with appropriate X events instead of WM_* messages). I don't know what were the reasons to prefer button state difference translation, instead of window system events translation.
But this kind state translation _does_ hurt: only the window system is able to route button state changes appropriately, 'cause it knows where the _change_ happened.
0001-ak-Track-pointer-button-state.patch is just an attempt to provide a workaround for this state-translation system: the way to notify Tk that it should not generate any events for the change between the last button state and the current one. No real solution is possible without rewriting Tk event layer so it will translate window system events instead of tracking button state changes. Both in the File dialog and context help cases, _each_ possible solution will either (1) leak a button click, (2) leak a button release, (3) hang a little and then leak a click if the button is still down, (4) hang a little and then leak a button release if the button is still down. Original bug is (1), currently used Tk solution is (3), my proposed solution is (2). It's easy to turn my proposed solution into (4); the ugliness of the result is the only thing discouraging me from doing it. I'm afraid we are trapped in (1,2,3,4) variants, unless we redesign Tk event handling as described above.
I wrote up a scratch win32 application to monitor clicks in its window. Clicking a button on a dialog over this app does not cause any clicks to go underneath. Only double-clicking a file icon in the open-file dialog causes a WM_LBUTTONUP to be sent to the window underneath. The original bugs that produced the EatSpuriousMessage hack (611615 and 219926) point to this as a known MS issue and raised the importance of this because if a Tk radio/checkbutton was under the place where the double-click occurred then they acted as if a click had been sent and toggled their state. Note: that tk_getOpenFile is not special in this - the open-file dialog from any windows application will cause this button-up event to be passed down - it is a deficiency in how the shell dialog deals with this event.
I have tested this on Windows XP now and I cannot reproduce the original fault using your patched version and I am in full agreement that the current EatSpuriousMessage hack is a horrible hack. I just intend to be very careful that we dont just reintroduce something that was fixed a few years ago or break X11 pointer handling which trying to fix windows handling.
Looking at button.tcl event handling I can see that an extra ButtonRelease should not cause the button to be invoked in Tk. In the button down event we remember which button got the press event and only perform the 'click' action if the release occurs in the same window. The ttk button has a similar check (it's state must be {pressed !disabled}).
In testing this on XP by doing the double click in an open-file dialog (from Notepad) where the pointer is carefully positioned over a radiobutton/ttk::radiobutton it mostly is ok but on a couple of occasions the radiobutton got checked. I do not see why as in my scratch application is NEVER see a down+up but only ever an up event.
Concerning the design of pointer handling - you know more about this than I do at the moment. I have no idea why it is done the way it is except it is probably just emulating how the X11 port works.
I'm still attempting to understand just what is being done in Tk_UpdatePointer as trying to decipher the flag states is confusing. But I think this patch is worthwhile.
I checked the context help click too and this is passing a Release event to the control that gets clicked. It seems harmless but slightly worrying.
It is not clear to me what the non-client button message handling in tkWinWm.c is for exactly. Why was this addition required? (I think its to handle the contexthelp cursor when it leaves the window but I am not sure).
Concerning non-client button messages:
The code in tkWinWm.c is unrelated to context help problems. Instead,
it tries to prevent another family of spurious press/release messages,
this time unrelated to any system dialog.
The most common manifestation of the problem I'm trying to solve can be
seen using the following test scenario:
1. Create a single Tk toplevel (or use the root one)
1.1 Ensure that it is not zoomed
2. bind . <ButtonPress-1> "puts Press"
3. bind . <ButtonRelease-1> "puts Release"
4. Maximize (zoom) the window by double-clicking its title bar,
so that the point clicked will be _inside the client area_ of
the window when it's maximized.
Unpatched Tk will see a click on the window. The version that handles
WM_NC* will see a single button release.
Tk/win32 generates Press and Release events when it notices the button
state changes. In the test above, unpatched Tk doesn't receive any
event during the double-click sequence; when the window is maximized,
it "suddenly" sees the button <1> is down, and generates a click.
Patched version tracks each press/release event of the same sequence
as soon as it happens (by WM_NC* message). Tk_UpdatePointer tracks
mouse state changes, and (righly) ignores all events happening outside
the toplevel client area. When the window is maximized, Tk again sees
the button is down, but this time it has already seen it (when the
second WM_NCLBUTTONDOWN was received).
(Note that the case of double-click zooming is only the most easily
reproduced and the most realistic of the problem manifestations - it
really causes some random button presses e.g. when an application
having a title bar is maximized).
NOCURSOR_MASK is a required addition, because Tk_UpdatePointer
normally tries to adjust the cursor shape on each mouse motion - and
in case of WM_NC* messages, the _system_ wants to manipulate the
cursor shape too. Without NOCURSOR_MASK, we would see a standard arrow
cursor at window borders and corners, instead of "resizing arrows"
provided by the system.
More I write about these hacks and hacks-over-hacks, more I want to
rewrite the button event handling completely in a straightforward way
(translating system WM_LBUTTON to ButtonPress-1, etc. - without state
change tracking). What stops me now is that I'm unable to implement
the same thing on MacOSX branch, so I'd have to keep both code paths
working ("click-translation" for Win32 and probably X11, and "state
tracking" for MacOSX). I'd be grateful for your advice.
Maybe you know some Tk developer who is an expert on MacOSX: then
we should probably discuss the issue together.
P.S. WM_NC* handling code is someway related to context-help: when a widget is clicked in the "question-mark mode", it's a non-client click event, too. Thus when the WM_NC* code is present, the context-help click is naturally absorbed (only a button release is passed on, as in all other cases). Without WM_NC* handling, context help click would cause a spurious click event in addition to <<Help>> event.
Ahh. That makes sense -- it must have been the non-client click handling that was missing when I tested out your context-help patch a year or two ago.
That zoomed test is interesting. I happen to be using Linux at the moment and doing that on X11 does not generate any button events. So it looks greatly like the pointer handling that Tk uses while it makes sense using X11's pointer event model is not a close fit to how Windows manages things.
The two people who know the Tk internals best are das (for all things MacOS) and jenglish (for X11).
New button handling + (contexthelp,shadow) as two patches
Hello,
I'm attaching tk-patches-lbs-chelp-shadow.zip, an archive containing two patches: 00-tk-logicalbuttons.diff and 10-tk-chelp-shadow.diff (functionally depending on the first one).
I took a new approach to mouse button status (for win32), and that enabled me to fix spurious clicks at once, _without_ need to handle non-client messages and to account for some obsure cases (like a foreing, e.g. notepad.exe, window closed by double-click while its LT corner is over Tk application). This new approach also leaves cross-platform code mostly alone, touching mostly mswindows-specific parts of Tk.
Instead of adding obscure flags to Tk_UpdatePointer api, I leave untouched its "event emulation model" (which assumes all button state transitions to correspond to some press or release event); but in tkWinX.c I replace mouse button states requested from GetKeyState() with _logical_ button states, updated on WM_?BUTTONDOWN and WM_?BUTTONUP events in the child window procedure. The rules (for mswin systems) now are as following:
- initially each Tk thread thinks that no mouse button is pressed;
- WM_LBUTTONDOWN message switches "logical state" for button1 to pressed, and the event is generated. WM_LBUTTONDOWN switches it back to released, again with event generation. Ditto for WM_RBUTTONUP/down etc.
- WM_MOUSEMOVE message is examined for modifier mask in wParam. All buttons that are released according to this mask are forced to be logically released (it's needed for a number of practically-rare cases when the application may receive WM_LBUTTONDOWN without corresponding WM_LBUTTONUP, such as the case when the window is destroyed in response to Press and then cursor goes outside Tk while the button is still pressed).
- WM_MOUSELEAVE (which the system may generate for system widgets that use TrackMouseEvents internally) forces all logical buttons to be released. The same WM_MOUSELEAVE is generated when the menu modal loops exits.
The main properties of this solution are the following:
- Tk_UpdatePointer's event model is left untouched,
- Each <Press> corresponds to WM_LBUTTONDOWN, and most (though not every) <Release> corresponds to WM_LBUTTONUP.
(if someone's going to implement his own solution, I'd advise him to follow the same principles).
All static variables in tkWinPointer.c are made thread-local (including newly-appeared logicalButtonState). The reason for it is that all windows GUI "reality" is thread-specific (even GetKeyState result is determined by the events the thread have received, not by the mouse physical state as it may seem). Anyway, with multiple threads, statics are not an option: they should either be mutex-protected or thread-localized. With Windows GUI, I recommend the second option, as it's usually appropriate.
Cursor updates are handled almost as they were, except that TK_WM_IGNORE_CURSOR flag is added: it denotes that the system will take care on the window's cursor (and its children's too), and no updates should be made by Tk. Though it is used by context help, I decided to include it in the first patch (as belonging to "general Tk fixes"): it may be used potentially e.g. for "busy cursor" on X11, for embedding system widgets like rich text box on mswin, and the like.
One of the changes in tkPointer:UpdateCursor() is the only one in this patchset possibly affecting other OSes (not with X11 windowing system, because tkPointer.c is an emulation layer used on non-X11 systems only). Inheriting cursors from parent windows are made work as it does on X11, where the top-level window never inherits cursor from its logical parent. I believe that it's both correct and least-surprising behavior, but I'd ask for testing of cursor inheritance on MacOSX (aqua), as it seems to be the only platform that's both affected by this patch and is not available to me for testing.
The second patch, 10-tk-chelp-shadow.diff, adds -shadow and -contexthelp attributes. Context help should work fine _only_ when the first patch is already applied (and now, it doesn't has to hack the pointer state after WM_HELP arrived - so no <Press> and no more <Release> is seen by the application). The major correction in this revision is that SC_CONTEXTHELP is now understood to be the system command entering the nested modal event loop. It was ignored in my previous attempts,
so while the "question-mark mode" was active, GUI events were allowed to accumulate in the queue, but handled all at once after the "question-mark click". This is no more the case.
Cursor updates are now ceased specifically for the "help-mode" toplevel - much more clean solution than checking for a special "help cursor" and refusing to override it. Toplevel wrapper window procedure does it by temporarily setting TK_WM_IGNORE_CURSOR flag on the window.
As of the old CS_VREDRAW|CS_HREDRAW modification, I now reverted it, just to avoid controversy. I still believe that all Tk widgets invalidate themselves when they're resized, without need to receive WM_PAINT from the system; and if there are some extensions that behave differently, they probably do so for a reason. But let's leave it as it is, as I can't even notice any slowdown that could be caused by CS_VREDRAW|CS_HREDRAW in theory.
As of the first patch, I believe that committing it is an imminent need (possibly without TK_WM_IGNORE_CURSOR parts, if they seem dangerous). Features added by the second patch may wait.
After a while, as a result of testing the patchset in the production use, I discovered the need to restore Win2K- compatibility broken by the -dropshadow addition. See 20-tk-cshadow-win2k-compat.patch (if we can't register a window class with CS_DROPSHADOW, we try to register it without this attribute, so it becomes an alias for the normal toplevel window class). No other problems were discovered by now (specially, nothing seems to be broken by the "bug-fixing" part of the patchset).
There are problems expected with additional mouse buttons above the 3rd, but the old code couldn't handle them correctly as well. Need some testing with a multi-button mouse to know for sure.
I'm still waiting for some recommendation on what part can (should) be committed (or for a commit, or for a rejection, etc). By now, I've maintained the "bug-fix" part of it for my applications for ~4 years. Is there any hope to see it finally in the upstream?