#330 Win32 def-call-in GPFs when calling a generic function

segfault
open
Jörg Höhle
ffi (23)
5
2006-04-04
2006-04-04
Jack D. Unrue
No

I have attached a testcase that demonstrates a GPF on
Win32 when a def-call-in that is declared with
(:language :stdc-stdcall) invokes a generic function.
A nearly identical def-call-in that invokes a normal
function does not crash.

Please see the comments at the top of the testcase for
more details. My clisp build is:

GNU CLISP 2.38 (2006-01-24) (built on
winsteingoldlap.bluelnk.net [10.41.52.143])
Software: GNU C 3.4.4 (cygming special) (gdc 0.12,
using dmd 0.125)

Discussion

  • Jack D. Unrue
    Jack D. Unrue
    2006-04-04

    stdcall def-call-in testcase

     
    Attachments
  • Jörg Höhle
    Jörg Höhle
    2006-04-24

    Logged In: YES
    user_id=377168

    Can you please check whether using the timer may cause
    re-entry within a different thread? CLISP does not support
    threads and trying to get them via signals is going to crash
    more or less fast, as the environment is not setup properly.
    So please ensure that this callback timer code does not
    create a thread in your back where it calls your handler.

     
  • Jack D. Unrue
    Jack D. Unrue
    2006-04-24

    Logged In: YES
    user_id=119851

    The testcase exercises one of the two primary variations of
    the SetTimer Win32 API. But either way, the timer
    notification is processed in the context of the thread that
    owns the message queue.

    The MSDN documentation states:

    "When you specify a TimerProc callback function, the default
    window procedure calls the callback function when it
    processes WM_TIMER. Therefore, you need to dispatch messages
    in the calling thread, even when you use TimerProc instead
    of processing WM_TIMER"

    The above requirement to "dispatch messages in the calling
    thread" is accomplished in my testcase via the
    run-message-loop function.

    Also, to empirically verify that there is not an additional
    thread involved, I used a utility available for Windows
    called Process Explorer to inspect the lisp.exe process,
    specifically to watch for threads being created and running,
    prior to executing my testcase and while it was running. I
    verified that no additional threads are created for the timer.

     
  • Jörg Höhle
    Jörg Höhle
    2006-04-28

    Logged In: YES
    user_id=377168

    It appears that the bug does not depend on GF/CLOS. You can
    replace the callback with a call to BREAK. Calling EXT:GC
    therein yields, on my machine (MS-VC build):
    [4]> (run-timer)
    *** - illegal foreign data type
    #(NIL #<SYSTEM-FUNCTION CLOS::%ALLOCATE-INSTANCE>
    #<SYSTEM-FUNCTION CLOS::%INITIALIZE-INSTANCE>
    #<SYSTEM-FUNCTION CLOS::%SHARED-INITIALIZE>)
    or
    Break 1 [13]> (ext:gc)
    *** - illegal foreign data type (524288)
    > :bt1 [likely crashes]
    which indicates that CLISP's heap is corrupt.
    This can be caused either by a bad interface (e.g. incorrect
    declarations) or some bug in def-call-in or a similar
    location

     
  • Jack D. Unrue
    Jack D. Unrue
    2006-04-28

    Logged In: YES
    user_id=119851

    While the behavior on my machine is slightly different (I
    get the Windows GPF dialog instead of a diagnostic message
    from CLISP), I too see that BREAK triggers the crash. So I
    agree it has nothing to do with CLOS.

    I have double-checked my declarations against the C
    declarations in the Windows header files, and I'm as sure as
    I can be that they're correct. I've also checked that
    FFI:SIZEOF reports the expected sizes for each of the
    parameter types of the callback.

     
  • Jack D. Unrue
    Jack D. Unrue
    2006-04-28

    Logged In: YES
    user_id=119851

    I should also mention that this testcase is a manual
    translation (and abbreviated version) of equivalent code
    that I've written via CFFI as part of a library. This
    equivalent code that uses a timer callback works without
    problems on LispWorks 4.4.6.

    I've attached a slightly simplified version of the testcase
    that gets rid of the defclass/defmethod noise and where the
    timer-proc callback function simply calls BREAK.

     
  • Jörg Höhle
    Jörg Höhle
    2006-05-03

    Logged In: YES
    user_id=377168

    Uhoh, it's a hairy stack/register possibly GC consistency
    bug somewhere in CLISP.

    I've enabled -DDEBUG_BACKTRACE and -DSAFETY=1 and the FFI
    testsuite does not pass anymore (1 error in all tests):
    Form: (LIST (FUNCALL FPCALLBACK 3.5d0) *X*)
    CORRECT: (3.5 3.5d0)
    CLISP : ERROR
    NIL cannot be converted to the foreign type SINGLE-FLOAT
    I think it's related (callback involved).

    [3]> (run-timer)
    eval.d:5779:w/s/b/t: before: circularity!
    [0/0x32a264]> #<COMPILED-FUNCTION SYS::COERCE-TO-CONDITION>
    delta: STACK=0; SP=4
    6
    [1/0x32a31c]> #<SYSTEM-FUNCTION CL::SIGNAL> 1 args delta:
    STACK=18; SP=107374064
    4
    [2/0x3290ac]> #<COMPILED-FUNCTION
    MOP::COMPUTE-EFFECTIVE-METHOD-AS-FUNCTION-FORM
    > delta: STACK=4; SP=251
    [3/0x329498]> #<COMPILED-FUNCTION
    MOP::COMPUTE-EFFECTIVE-METHOD-AS-FUNCTION>
    *** - handle_fault error2 ! address = 0x1 not in
    [0x1a3d0000,0x1a500ee8) !
    SIGSEGV cannot be cured. Fault address = 0x1.
    Permanently allocated: 88000 bytes.
    Currently in use: 1827664 bytes.
    Free space: 326824 bytes

     
  • Jörg Höhle
    Jörg Höhle
    2006-05-24

    Logged In: YES
    user_id=377168

    I've written some lines of code which I thought were mostly
    identical to the original code. However, my code does not
    trigger the bug.
    It's similar in that it uses call-out + call-in,
    with :language :stdc-stdcall.
    It's different in that it does not use any external
    function. Thus there's potential for a possibly bogus
    ffcall behaving as a convolution with my code, since it
    calls itself (the vacall trampoline), which is different
    from the original code, where MS-Windows API is called.

    (use-package "FFI")
    (def-c-type tfun-sc
    (c-function (:arguments (hwnd ffi:c-pointer)
    (msg ffi:uint)
    (id ffi:uint)
    (time ffi:ulong))
    (:language :stdc-stdcall)
    ;(:language :stdc)
    (:return-type uint)))
    (defun cb-gc (hwnd msg id time)
    (declare (ignore hwnd msg id time))
    (ext:gc) 12356)
    (defun cb-break (hwnd msg id time)
    (declare (ignore hwnd msg id time))
    (break "in signal loop"))
    (setq callbackgc (with-c-var (x 'tfun-sc #'cb-gc) x))
    (funcall callbackgc nil 1 2 3)
    (setq callbackbreak (with-c-var (x 'tfun-sc #'cb-break) x))
    (funcall callbackbreak nil 111111 2222222 333333333)

    I'm wondering why I can't find the 1111 222 3333 args close
    to each other in the backtrace :bt1
    #<FUNCTION CB-BREAK (HWND MSG ID TIME)
    (DECLARE (SYSTEM::IN-DEFUN CB-BREAK) (IGNORE HWND MSG ID
    TIME))
    (BLOCK CB-BREAK (BREAK "in signal loop"))> 4
    - #(C-POINTER 0 UINT 0 UINT 0 ULONG 0)
    -
    #<FUNCTION CB-BREAK (HWND MSG ID TIME)
    (DECLARE (SYSTEM::IN-DEFUN CB-BREAK) (IGNORE HWND MSG ID
    TIME))
    (BLOCK CB-BREAK (BREAK "in signal loop"))>
    - UINT
    - UINT
    - 333333333 ; huh? weshalb Args
    so komisch zerhackt?
    <14> #<SYSTEM-FUNCTION FFI::FOREIGN-CALL-OUT> 5
    - 2222222 ; weitere Args?
    <15> #<SYSTEM-FUNCTION FUNCALL> 5
    - 111111 ; weitere Args
    - NIL
    - #<FOREIGN-FUNCTION #x00BD1B60>

     
  • Jörg Höhle
    Jörg Höhle
    2006-10-12

    Logged In: YES
    user_id=377168

    Jack,
    please check whether the recent FFI 2006-10-11 CVS patch
    fixes this issue. It would be great if so!
    > fixed FFI callbacks, broken since the 2005-10-02 patch

     
  • Jack D. Unrue
    Jack D. Unrue
    2006-10-13

    Logged In: YES
    user_id=119851

    I get the following when running my original testcase with
    version 2.41:

    [3]> (load "stdcall-testcase.lisp")
    ;; Loading file stdcall-testcase.lisp ...
    ;; Loaded file stdcall-testcase.lisp
    T
    [4]> (run-timer)

    *** - GO: no tag named #:G6871 is currently visible
    The following restarts are available:
    *** - handle_fault error2 ! address = 0x5 not in
    [0x19d70000,0x19ebe47c) !
    SIGSEGV cannot be cured. Fault address = 0x5.
    Permanently allocated: 90016 bytes.
    Currently in use: 2482596 bytes.
    Free space: 36 bytes.