I have attached a testcase that demonstrates a GPF on
Win32 when a def-call-in that is declared with
(:language :stdc-stdcall) invokes a generic function.
A nearly identical def-call-in that invokes a normal
function does not crash.
Please see the comments at the top of the testcase for
more details. My clisp build is:
GNU CLISP 2.38 (2006-01-24) (built on
winsteingoldlap.bluelnk.net [10.41.52.143])
Software: GNU C 3.4.4 (cygming special) (gdc 0.12,
using dmd 0.125)
stdcall def-call-in testcase
Logged In: YES
user_id=377168
Can you please check whether using the timer may cause
re-entry within a different thread? CLISP does not support
threads and trying to get them via signals is going to crash
more or less fast, as the environment is not setup properly.
So please ensure that this callback timer code does not
create a thread in your back where it calls your handler.
Logged In: YES
user_id=119851
The testcase exercises one of the two primary variations of
the SetTimer Win32 API. But either way, the timer
notification is processed in the context of the thread that
owns the message queue.
The MSDN documentation states:
"When you specify a TimerProc callback function, the default
window procedure calls the callback function when it
processes WM_TIMER. Therefore, you need to dispatch messages
in the calling thread, even when you use TimerProc instead
of processing WM_TIMER"
The above requirement to "dispatch messages in the calling
thread" is accomplished in my testcase via the
run-message-loop function.
Also, to empirically verify that there is not an additional
thread involved, I used a utility available for Windows
called Process Explorer to inspect the lisp.exe process,
specifically to watch for threads being created and running,
prior to executing my testcase and while it was running. I
verified that no additional threads are created for the timer.
Logged In: YES
user_id=377168
It appears that the bug does not depend on GF/CLOS. You can
replace the callback with a call to BREAK. Calling EXT:GC
therein yields, on my machine (MS-VC build):
[4]> (run-timer)
*** - illegal foreign data type
#(NIL #<SYSTEM-FUNCTION CLOS::%ALLOCATE-INSTANCE>
#<SYSTEM-FUNCTION CLOS::%INITIALIZE-INSTANCE>
#<SYSTEM-FUNCTION CLOS::%SHARED-INITIALIZE>)
or
Break 1 [13]> (ext:gc)
*** - illegal foreign data type (524288)
> :bt1 [likely crashes]
which indicates that CLISP's heap is corrupt.
This can be caused either by a bad interface (e.g. incorrect
declarations) or some bug in def-call-in or a similar
location
Logged In: YES
user_id=119851
While the behavior on my machine is slightly different (I
get the Windows GPF dialog instead of a diagnostic message
from CLISP), I too see that BREAK triggers the crash. So I
agree it has nothing to do with CLOS.
I have double-checked my declarations against the C
declarations in the Windows header files, and I'm as sure as
I can be that they're correct. I've also checked that
FFI:SIZEOF reports the expected sizes for each of the
parameter types of the callback.
Logged In: YES
user_id=119851
I should also mention that this testcase is a manual
translation (and abbreviated version) of equivalent code
that I've written via CFFI as part of a library. This
equivalent code that uses a timer callback works without
problems on LispWorks 4.4.6.
I've attached a slightly simplified version of the testcase
that gets rid of the defclass/defmethod noise and where the
timer-proc callback function simply calls BREAK.
Logged In: YES
user_id=377168
Uhoh, it's a hairy stack/register possibly GC consistency
bug somewhere in CLISP.
I've enabled -DDEBUG_BACKTRACE and -DSAFETY=1 and the FFI
testsuite does not pass anymore (1 error in all tests):
Form: (LIST (FUNCALL FPCALLBACK 3.5d0) *X*)
CORRECT: (3.5 3.5d0)
CLISP : ERROR
NIL cannot be converted to the foreign type SINGLE-FLOAT
I think it's related (callback involved).
[3]> (run-timer)
eval.d:5779:w/s/b/t: before: circularity!
[0/0x32a264]> #<COMPILED-FUNCTION SYS::COERCE-TO-CONDITION>
delta: STACK=0; SP=4
6
[1/0x32a31c]> #<SYSTEM-FUNCTION CL::SIGNAL> 1 args delta:
STACK=18; SP=107374064
4
[2/0x3290ac]> #<COMPILED-FUNCTION
MOP::COMPUTE-EFFECTIVE-METHOD-AS-FUNCTION-FORM
> delta: STACK=4; SP=251
[3/0x329498]> #<COMPILED-FUNCTION
MOP::COMPUTE-EFFECTIVE-METHOD-AS-FUNCTION>
*** - handle_fault error2 ! address = 0x1 not in
[0x1a3d0000,0x1a500ee8) !
SIGSEGV cannot be cured. Fault address = 0x1.
Permanently allocated: 88000 bytes.
Currently in use: 1827664 bytes.
Free space: 326824 bytes
Logged In: YES
user_id=377168
I've written some lines of code which I thought were mostly
identical to the original code. However, my code does not
trigger the bug.
It's similar in that it uses call-out + call-in,
with :language :stdc-stdcall.
It's different in that it does not use any external
function. Thus there's potential for a possibly bogus
ffcall behaving as a convolution with my code, since it
calls itself (the vacall trampoline), which is different
from the original code, where MS-Windows API is called.
(use-package "FFI")
(def-c-type tfun-sc
(c-function (:arguments (hwnd ffi:c-pointer)
(msg ffi:uint)
(id ffi:uint)
(time ffi:ulong))
(:language :stdc-stdcall)
;(:language :stdc)
(:return-type uint)))
(defun cb-gc (hwnd msg id time)
(declare (ignore hwnd msg id time))
(ext:gc) 12356)
(defun cb-break (hwnd msg id time)
(declare (ignore hwnd msg id time))
(break "in signal loop"))
(setq callbackgc (with-c-var (x 'tfun-sc #'cb-gc) x))
(funcall callbackgc nil 1 2 3)
(setq callbackbreak (with-c-var (x 'tfun-sc #'cb-break) x))
(funcall callbackbreak nil 111111 2222222 333333333)
I'm wondering why I can't find the 1111 222 3333 args close
to each other in the backtrace :bt1
#<FUNCTION CB-BREAK (HWND MSG ID TIME)
(DECLARE (SYSTEM::IN-DEFUN CB-BREAK) (IGNORE HWND MSG ID
TIME))
(BLOCK CB-BREAK (BREAK "in signal loop"))> 4
- #(C-POINTER 0 UINT 0 UINT 0 ULONG 0)
-
#<FUNCTION CB-BREAK (HWND MSG ID TIME)
(DECLARE (SYSTEM::IN-DEFUN CB-BREAK) (IGNORE HWND MSG ID
TIME))
(BLOCK CB-BREAK (BREAK "in signal loop"))>
- UINT
- UINT
- 333333333 ; huh? weshalb Args
so komisch zerhackt?
<14> #<SYSTEM-FUNCTION FFI::FOREIGN-CALL-OUT> 5
- 2222222 ; weitere Args?
<15> #<SYSTEM-FUNCTION FUNCALL> 5
- 111111 ; weitere Args
- NIL
- #<FOREIGN-FUNCTION #x00BD1B60>
Logged In: YES
user_id=377168
Jack,
please check whether the recent FFI 2006-10-11 CVS patch
fixes this issue. It would be great if so!
> fixed FFI callbacks, broken since the 2005-10-02 patch
Logged In: YES
user_id=119851
I get the following when running my original testcase with
version 2.41:
[3]> (load "stdcall-testcase.lisp")
;; Loading file stdcall-testcase.lisp ...
;; Loaded file stdcall-testcase.lisp
T
[4]> (run-timer)
*** - GO: no tag named #:G6871 is currently visible
The following restarts are available:
*** - handle_fault error2 ! address = 0x5 not in
[0x19d70000,0x19ebe47c) !
SIGSEGV cannot be cured. Fault address = 0x5.
Permanently allocated: 90016 bytes.
Currently in use: 2482596 bytes.
Free space: 36 bytes.