Menu

Trying to receive SIGINT signal in Python Remote Console

Help
2023-10-20
2023-11-23
1 2 > >> (Page 1 of 2)
  • Nick Sullivan-Molina

    Hi all,

    I've been writing a 'controller' for gxsm, so that I can control it for automation purposes. For the most part, everything seems to work as expected (I'm sure I'll have more questions in the future!). However, there is one aspect I struggle with: stopping the automation script.

    In the Python Remote UI, there is a button to 'start' your script, and one to 'stop it'; for the latter, it attempts to send a SIGINT to the script. The problem I am having is that the SIGINT is not sent!

    Looking at the source code for pyremote (and trying to grok this whole Python C-API), it appears that:
    1. We initialize/set up our environment by importing the gxsm and stdout redirection modules, and initializing the Python GIL with signal handlers not set up (via PyInitialize_Ex(0)).
    2. We set up a default script, automatically importing gxsm and getting stuff up-and-running.
    3. When a script is loaded, we run run_file(), which loads the file into a string and calls run_command(). This method releases the GIL via PyEval_SaveThread() and starts a new GTK thread.
    4. The thread method, PyRun_GThreadFunc() calls the script via PyRun_String(), waiting until its finished. Since it is in a non-Python thread, we need to register/unregister ourselves with the GIL via PyGILState_Ensure() and PyGILState_Release(). At the end, we call PyEval_RestoreThread() to grab the GIL again.
    5. When the script is killed, we try to send a SIGINT in kill() via PyErr_SetInterrupt(). From the documentation and stackoverflow, this should send the SIGINT signal to the main python thread.

    For some reason, the SIGINT signal is not going through. I know this because I already have a KeyboardInterrupt try/catch (what Python maps SIGINT to), and do not hit it. To be extra sure, I also modified pyremote.c to explicitly tie SIGINT to the default handler :

    run_string("import signal\nsignal.signal(signal.SIGINT, signal.default_int_handler)\n")
    

    Additionally, I tried the "other" method in the code: using Py_AddPendingCall() to provide a functor to a function that will do the same. This also does not work. I suspect this does not work, because it apparently requires the main thread to be executing Python code (see here: https://stackoverflow.com/a/12448791).

    I also tried some of the other methods (which are commented in pyremote.c as well):
    * PyErr_SetInterrupt().
    * PyErr_SetString().

    None appear to work.

    I wondered if the reason it is not working is due to the main thread (where we are running) not running Python and/or having released the GIL. I tried hacking around and 'grabbing' the GIL via PyEval_RestoreThread() / PyEval_SaveThread(), but was unable to get it to work.

    A last hail-mary option I read about was sending the SIGINT to a specific thread by specifying the thread id (i.e. what is described here: https://stackoverflow.com/a/56140288). However, since the thread we are running Python in is a GTK thread, I don't seem to see how to get the thread id.

    Are you seeing the same behavior?

    Thanks,
    Nick

     
    • Percy Zahl

      Percy Zahl - 2023-10-20

      Script forced exit:

      you tell me, no clue...

      As you see there are many attempts (if you look up the history) getting this sig int passed to python. Yes some thing is related to the thread -- possibly.

      What you all list as of possible options is about all I recall been testing myself long ago...

      See also:
      https://docs.python.org/3/c-api/exceptions.html

      I assume you located "py_gxsm_consoile::kill", line 3717 in pyremote.cpp (OK that Gxsm4, but still the same issue, see also some testing code fractions and the stackoverflow note above.... disabled via "if 0"...)
      https://github.com/pyzahl/Gxsm4/blob/master/plug-ins/common/pyremote.cpp

      And honestly at some point I gave up on it....

      Alternative solution:
      ... and started writing non blocking scripts what is smarter anyways and keep monitoring any of the script control entry values to exit correctly if requested!

      In general like this, set and check for user exit request via setting script control entry to 0 for example:

              gxsm.set('script-control','1')
              while int(gxsm.get('script-control')) > 0:
                      ...
      

      PS:
      It kind of worked in the past before been fully threaded -- but even then it had advert side effects/not clean and made the whole gxsm task in-stable.

      But if you find a solution, I am happy to listen and implement it ;)

      -P

      You got lucky:

      UPDATE on github now:

      Do not ask my why, but I just played with some "Ctrl-C" injection methods again, and all sudden it works just fine!!! All I did was using those two lines below (this was used long ago, but failed some time ago) now in the kill function vs. the SetInterrupt() - - what causes terminating the whole gxsm process:

                      //PyErr_SetInterrupt ();  <- terminates whole gxsm :(
      
                      // New "old" working now:
                      PyErr_SetString(PyExc_KeyboardInterrupt, "Abort");
                      PyErr_CheckSignals();
      

      Here is a cute test script:

      gxsm.set('script-control','1')
      i=0
      while int(gxsm.get('script-control')) > 0:
          print (i)
          i=i+1
          gxsm.sleep(10)
      
       

      Last edit: Percy Zahl 2023-10-20
  • Nick Sullivan-Molina

    Hi Percy,

    Thank you for the quick update. Unfortunately, I don't seem to see the same behavior. I've updated to the latest gxsm3 on github, and this is the behavior I see:
    1. If I call with "PyErr_SetString(PyExc_KeyboardInterrupt, "Abort");", I get the following stack trace:

    Thread 1 "gxsm3" received signal SIGSEGV, Segmentation fault.
    0x00007fffc23d4f94 in _PyErr_GetTopmostException () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0
    (gdb) bt
    #0  0x00007fffc23d4f94 in _PyErr_GetTopmostException ()
        at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
    #1  0x00007fffc23d62e6 in _PyErr_SetObject () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
    #2  0x00007fffc23d6598 in _PyErr_SetString () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
    #3  0x00007fffc292753a in py_gxsm_console::kill(_GtkToggleButton*, void*)
         (btn=0x5555580f9340, user_data=0x5555558c3f80) at pyremote.C:4704
    #4  0x00007ffff726a700 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #5  0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #6  0x00007ffff773daa0 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #7  0x00007ffff724cd2f in g_closure_invoke () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #8  0x00007ffff7268895 in  () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #9  0x00007ffff726a614 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #10 0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #11 0x00007ffff773d874 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #12 0x00007ffff79f0be5 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #13 0x00007ffff726a700 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #14 0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #15 0x00007ffff7807ffc in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #16 0x00007ffff7251866 in g_cclosure_marshal_VOID__BOX
        at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #17 0x00007ffff726a700 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #18 0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #19 0x00007ffff77ffacb in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #20 0x00007ffff780783b in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #21 0x00007ffff7808443 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #22 0x00007ffff77cef90 in gtk_event_controller_handle_event ()
        at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #23 0x00007ffff79a0045 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #24 0x00007ffff79e6eb8 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #25 0x00007ffff726a700 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #26 0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #27 0x00007ffff79ae724 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #28 0x00007ffff7851680 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #29 0x00007ffff785252a in gtk_main_do_event () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #30 0x00007ffff7ed4743 in  () at /lib/x86_64-linux-gnu/libgdk-3.so.0
    #31 0x00007ffff7f0bf56 in  () at /lib/x86_64-linux-gnu/libgdk-3.so.0
    #32 0x00007ffff7153d3b in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
    #33 0x00007ffff71a9258 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
    #34 0x00007ffff71513e3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
    #35 0x00007ffff7378fb5 in g_application_run () at /lib/x86_64-linux-gnu/libgio-2.0.so.0
    #36 0x00005555555d5aee in main(int, char**) (argc=1, argv=0x7fffffffdfa8) at gxsm_main.C:559
    
    1. If I call with:
    PyErr_SetInterrupt(); 
    PyErr_CheckSignals();
    

    I get the following backtrace. Note that you must call PyErr_CheckSignals() following SetInterrupt() for anything to happen, see here, effect is on next PyErr_CheckSignals().

    Thread 1 "gxsm3" received signal SIGSEGV, Segmentation fault.
    0x00007fffba363fb1 in _PyUnicode_FromId () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0
    (gdb) bt
    #0  0x00007fffba363fb1 in _PyUnicode_FromId () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
    #1  0x00007fffba3e893d in _PyImport_GetModuleId () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
    #2  0x00007fffba403945 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
    #3  0x00007fffba403a48 in Py_FinalizeEx () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
    #4  0x00007fffba975bac in py_gxsm_console::~py_gxsm_console()
         (this=0x555555a33f90, __in_chrg=<optimized out>) at pyremote.C:4443
    #5  0x00007fffba975c04 in py_gxsm_console::~py_gxsm_console()
        (this=0x555555a33f90, __in_chrg=<optimized out>) at pyremote.C:4445
    #6  0x00007fffba979f46 in pyremote_cleanup() () at pyremote.C:5617
    #7  0x00005555555d7bb1 in plugin_ctrl::cleanup_pi(void*)
        (this=0x55555771b7e0, pi=0x7fffba996ca0 <pyremote_pi>) at plugin_ctrl.C:261
    #8  0x00005555555d6d8e in plugin_ctrl::~plugin_ctrl()
        (this=0x55555771b7e0, __in_chrg=<optimized out>) at plugin_ctrl.C:119
    #9  0x00005555555da184 in gxsm_plugins::~gxsm_plugins()
        (this=0x55555771b7e0, __in_chrg=<optimized out>) at plugin_ctrl.C:588
    #10 0x00005555555e67f7 in App::reload_gxsm_plugins(int) (this=0x55555596c000, killflag=1)
        at gxsm_app.C:887
    #11 0x00005555555e2a2a in App::~App() (this=0x55555596c000, __in_chrg=<optimized out>)
        at gxsm_app.C:179
    #12 0x00005555555e2f70 in App::~App() (this=0x55555596c000, __in_chrg=<optimized out>)
        at gxsm_app.C:204
    #13 0x00005555555f33c1 in App::file_quit_callback(_GSimpleAction*, _GVariant*, void*)
        (simple=0x555555bee770, parameter=0x0, user_data=0x5555559b40f0) at gxsm_menucb.C:260
    #14 0x00007ffff724cd2f in g_closure_invoke () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #15 0x00007ffff7268c36 in  () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #16 0x00007ffff726a614 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #17 0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #18 0x00007ffff737d035 in  () at /lib/x86_64-linux-gnu/libgio-2.0.so.0
    #19 0x00007ffff773c3b7 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #20 0x00007ffff724cd2f in g_closure_invoke () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #21 0x00007ffff7268e11 in  () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #22 0x00007ffff726a614 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #23 0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #24 0x00007ffff773daa0 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #25 0x00007ffff724cd2f in g_closure_invoke () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #26 0x00007ffff7268895 in  () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #27 0x00007ffff726a614 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #28 0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #29 0x00007ffff773d874 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #30 0x00007ffff79f0be5 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #31 0x00007ffff726a700 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #32 0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #33 0x00007ffff7807ffc in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #34 0x00007ffff7251866 in g_cclosure_marshal_VOID__BOXEDv ()
        at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #35 0x00007ffff726a700 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #36 0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #37 0x00007ffff77ffacb in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #38 0x00007ffff780783b in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #39 0x00007ffff7808443 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #40 0x00007ffff77cef90 in gtk_event_controller_handle_event ()
        at /lib/x86_64-linux-gnu/libgtk-3.so.0
    --Type <RET> for more, q to quit, c to continue without paging--
    #41 0x00007ffff79a0045 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #42 0x00007ffff79e6eb8 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #43 0x00007ffff726a700 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #44 0x00007ffff726a863 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
    #45 0x00007ffff79ae724 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #46 0x00007ffff7851680 in  () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #47 0x00007ffff785252a in gtk_main_do_event () at /lib/x86_64-linux-gnu/libgtk-3.so.0
    #48 0x00007ffff7ed4743 in  () at /lib/x86_64-linux-gnu/libgdk-3.so.0
    #49 0x00007ffff7f0bf56 in  () at /lib/x86_64-linux-gnu/libgdk-3.so.0
    #50 0x00007ffff7153d3b in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
    #51 0x00007ffff71a9258 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
    #52 0x00007ffff71513e3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
    #53 0x00007ffff7378fb5 in g_application_run () at /lib/x86_64-linux-gnu/libgio-2.0.so.0
    #54 0x00005555555d5aee in main(int, char**) (argc=1, argv=0x7fffffffdfa8) at gxsm_main.C:559
    

    For reference, I've used the following sample to make sure it's not my python script that is the issue:

    import time
    
    print("Starting loop")
    keep_going = True
    while keep_going:
        try:
            time.sleep(0.1)
        except KeyboardInterrupt:
            print("KBInterrupt received, ending")
            keep_going = False
    
    print("End of loop")
    

    Could you clarify your python version? I'm using python 3.10, perhaps I can replicate your results with your version.

    Thank you,
    Nick

     
    • Percy Zahl

      Percy Zahl - 2023-10-23

      Interesting... so this actually may be a python lib issue!
      I am developing on Debian 12 and Testing. (Python 3.11)

      However, did you tried my script above without the try...except...?

      As your script for odd reason also crashes when I click the abort button while mine does not!

      gxsm.set('script-control','1')
      i=0
      while int(gxsm.get('script-control')) > 0:
          print (i)
          i=i+1
          gxsm.sleep(10)
      

      Output:

      >>> Executing parsed script >>>
      0
      1
      2
      3  << here I pressed the abort button
      
      *** SCRIPT INTERRUPT REQUESTED: Setting PyErr SIGINT to abort script.
      KeyboardInterrupt: Abort
      
      The above exception was the direct cause of the following exception:
      
      Traceback (most recent call last):
        File "<string>", line 8, in <module>
      SystemError: <built-in function sleep> returned a result with an exception set
      
      
      <<< PyRun user script (as thread) run raised an exeption. <<<
      --END IDLE--
      

      Gxsm keeps working fine and I can do this as often I want!

      While with your script it is also crashing -- oddly!

      -P

       
  • Nick Sullivan-Molina

    Hello Percy,

    Excuse the delay in responding: I have spent quite a bit of time trying to understand the issue.

    TL;DR: time.sleep() cannot be used under the current design, and must be replaced with gxsm.sleep() for it to work. We may want to discuss a slight re-design, depending on the purpose of the current design.

    This post will go somewhat in-depth into what is going on; I'm trying to be as explicit as possible, because I found it confusing. I suspect some or all of what I am saying may be already known to you, so excuse me if I seem to say obvious things.

    Overview/Background

    pyremote.c/.h does two things at once: it contains a set of Python extensions (gxsm and redirection), and it embeds python. I.e.: we are extending embedded python. On top of this, in the current version there is multi-threaded python, where: (a) we have our embedded Python initialized (with extensions appended and imported); and (b) we spawn a new GTK thread running python whenever we run a python command.

    What we have been discussing is how to send a SIGINT signal to a running python script, to tell it to stop running.

    Global Interpreter Lock
    - The GIL is a mutex that must be held by a given thread before it can safely access Python objects. Only one thread can hold the GIL at a time, meaning that operations on Python objects are bottlenecked by this one-thread-at-a-time mechanism. See here.
    - The GIL is released via two mechanisms: (a) a Python thread explicitly releases it (usually when performing blocking I/O or operations on non-Python objects); or (b) the switch interval (defined by sys.setswitchinterval) has passed, causing it to be automatically locked, allowing the internal Python scheduler to choose what thread gets the GIL next. See here.
    - Note that the GIL includes the global exception state, stored in the global variable errno. So, when a thread releases the GIL, they store their current state (including exception), and when they retake the GIL they restore to that prior state (and exception). See here.

    C-API specifics about the GIL
    - In extension code, the GIL should be released when performing blocking I/O, or released when passing onto a new Python thread (which will use it). Releasing and restoring the GIL is accomplished via two methods: 'state = PyEval_SaveThread()' to release, and 'PyEval_RestoreThread(state)' to retake. See here.
    - When embedding Python, the initial thread that brought up the Python interpreter automatically registers to the GIL (the same as Python-instantiated threads, i.e. via threading in Python); going forward, this will be called the main interpreter thread (following Python conventions). However, for non-Python threads we need to explicitly register to the GIL before running any Python code, and unregister when we are done (a bit more is done, see documentation if curious). This is accomplished via two methods: 'gstate = PyGILState_Ensure()' to register and 'PyGILState_Release(gstate)' to unregister. See here.

    Sending signals
    1. PyErr_SetString() sets the current thread's error indicator. See here.
    2. PyErr_SetInterruptEx() simulates a signal arriving, which is handled the next time PyErr_CheckSignals() is called. It is explicitly highlighted as being for C code that handles signals on its own and wants to invoke Python signal handlers for specific scenarios. Note that PyErr_CheckSignals() must be called within the main interpreter thread, otherwise nothing will happen. I should specify that in my testing, this seems to work in pyremote.c's 'kill()', which presumably is a task thread (as it is linked to the 'kill' button signal); so, perhaps they are differentiating between it and other Python threads. See here and here.
    3. Py_AddPendingCall() schedules a function to be called from the main interpreter thread. The next time the
    main thread* hits a bytecode boundary while holding the GIL, it will try to call this method. Note that nothing will happen if the main interpreter thread is not running Python. See here.

    Current Situation

    Now, let's explain what was happening with these options under the current design.

    1. Firstly, we should not grab the GIL inside kill(). If we do so, we are waiting until our Python thread releases before doing anything. Since errno is stored and restored on release, anything we do will be forgotten by that thread!
    2. PyErr_SetString() and PyErr_SetInterruptEx()+PyErr_CheckSignals() do work when called within kill() (without grabbing the GIL). There are two potential situations that can be hit:
      a. If we call one of these while we are in our Python extension code (e.g., gxsm.sleep()), we will raise an exception and stop the interpreter. Any exception handling we put into our script will not be hit.
      b. If we call one of these while we are outside of the extension code (i.e., within normal Python), we will raise an exception and stop the interpreter. Any proper exception handling we put into our script will be hit (e.g., logging errors).
      Thus, we have somewhat of a 'race condition', in terms of where the exception will be logged (which we have no control over). However, the script will shut down in either way.
    3. time.sleep() does not work under the current design. The reason is fairly straightforward: when time.sleep() is called, it releases the GIL (since we have no reason to keep it)! There are two potential situations that can be hit when calling PyErr_SetString() or PyErr_SetInterruptEx()+PyErr_CheckSignals():
      a. If we call one of these while we are not in time.sleep(), we get an exception and all is well.
      b. If we call one of these will we are in time.sleep(), we get a segmentation fault! The reason is simple: we are changing the GIL without having taken access! I should make a clarification: PyErr_SetInterruptEx() is async-signal safe and can be called without the GIL. However, PyErr_CheckSignals() cannot. Since we need the GIL to call it, we run into the same problem we mentioned earlier: our remote script will not see this exception because it will have restored to the prior errno when it retakes the GIL (without any exception).
    4. Py_AddPendingCall() does nothing currently, because we are not running any Python code within the main interpreter thread (technically, we may be, but this is only when we run a specific command in the input line of the UI; this is not continuous and thus is not a realistic option).

    So, just to reiterate about time.sleep(): a more 'proper' design of gxsm.sleep() might release the GIL while sleeping (because, the GIL should in principle schedule a different Python thread). But doing so would break our current design further.

    What can we do?

    So, given the above, I think there are 2 options I can see:
    1. If we do nothing, we should be very explicit to users that they must call gxsm.sleep() instead of time.sleep() in any scripts. This is implied in the documentation, under Known Bugs. However, the reason it gives is (in my mind) not clear enough: time.sleep will not "freeze gxsm totally during sleep", it breaks our ability to stop the script, resulting in a segfault. I should also note that there are likely other Python commands which release the GIL (e.g., while doing blocking I/O). Therefore, there are probably multiple other common commands which could cause a SIGSEGV on attemting to stop the script. However, time.sleep() is likely the biggest issue, since it will likely release the GIL for the longest "duty cycle" of the script, making it the most likely to trigger a SIGSEGV.
    2. Alternatively, we could consider running all Python calls within the main interpreter thread (i.e. not spawning a GTK thread for a script file). In this case, we could call Py_AddPendingCall() to call PyErr_SetString(), which would trigger a SIGINT when the main interpreter thread has the GIL and is running code.

    Do you know why it was decided to run any script in its own thread, rather than within the main interpreter thread? Maybe there are design considerations which make proposal (2) unfeasible.

    Thank you,
    Nick

     

    Last edit: Nick Sullivan-Molina 2023-10-27
  • Percy Zahl

    Percy Zahl - 2023-10-27

    That's an excellent analysis! Thanks.

    So in short, when using python level time.sleep() the GIL is released and the kbd interrupt goes no where and causing an exception with the PendingCall().... :(

    While when gxsm.sleep() is halting the the python thread via a "usleep()" (while still releasing the CPU load!!) at the gxsm python thread level while holding on the GIL and we can in issue the kbd interrupt to it!

    Q: Thus is there any issue not using the gxsm.sleep() at all times we may like to interrupt it then?

    AW your last Question: YES:

    No no and -- the python interpreter got moved to a thread for many good reasons to NOT block Gxsm or the GTK GUI and any critical gxsm core operations. This took a lot of work and is all clean now (except this GIL issue) as all python triggered Gxsm/GUI related actions are outsourced and triggering a proper GTK idle call to complete. This is very tedious on code level but is now strictly implemented for all calls from python.

    And FYI -- if you would have python non threaded again (in the gxsm/gtk main thread) and would use time.sleep() then -- you could not even press the "Kill" button, or nothing would happen as GTK is frozen as long python is running 100% -- as the python time.sleep() wont give back control to GTK or gxsm! Exactly why I introduced original version of gxsm.sleep(). Historic details:

    The gxsm.sleep() was originally introduced when python was not yet threaded to give back CPU time to Gxsm and GTK.... but this was a very poor and choppy workaround for quasi tasking and WAS historically calling gtk-event-pending() until sleep time was expired.

    Now, this is simply a C level usleep() call for the python thread. What is just fine -- I do not see a point why the GIL shall be released? Do you?

    So, is there a way to "wake up" or "force externally" python to regrab his GIL when in time.sleep() ??

    Just wondering, may be we can find solution there:

    How does this ^C abort work correctly when python is run from a console for example?
    Or -- not sure -- within a notebook?

    -P

     

    Last edit: Percy Zahl 2023-10-27
  • Nick Sullivan-Molina

    Hi Percy,

    First, a very minor correction:

    So in short, when using python level time.sleep() the GIL is released and the kbd interrupt goes no where and causing an exception with the PendingCall().... :(

    When we send the signal during a time.sleep(), the GIL is released and no one has taken control of it. So, when we send a signal/interrupt to the Python interpreter (while no one has control), it just throws a SIGSEGV.


    Regarding just telling people to use gxsm.sleep() instead of time.sleep(): I think this is a reasonable option (though I would propose making the documentation more explicit). However, keep in mind that we still don't know what other common Python methods release the GIL.

    A potential alternative I would propose would be to pull the Python initialization (i.e. what is done in py_gxsm_console::initialize()) into a g_thread_new(), and adding a variable in py_gxsm_console for passing the script filename and/or command we want to run. We would need to add a separate bool to indicate whether a script is running, so we don't try to run a command while a script is mid-running.

    If we did this, then we would have just one main interpreter thread, which would be the same thread where we can run Python scripts. In that case, I believe we could use Py_AddPendingCall() into kill() and it may work.

    This way, we would still have Python in its own thread, so there would be no hanging issues with the UI. However, it may avoid having to force gxsm.sleep(). What do you think? I could try it early next week, if you thought it made sense.

    Thanks,
    Nick

     
    • Percy Zahl

      Percy Zahl - 2023-10-27

      Agree, it would be good to know what other python "idle" and GIL release scenarios could come up!

      About "killing" it, I certainly could kill blindly the whole thread python runs within which I do have control over.... even that would be anything but nice to it.
      But normally I do not kill the python session -- you may noticed, if you rerun a script, active variables stay alive!

      I wonder if there is any way externally to check if the GIL is present? Or make it been regained to manipulate. That would be best.

       
  • Nick Sullivan-Molina

    Hi Percy,

    Just a short follow-up, as you asked a good question that I need to look into:

    How does this ^C abort work correctly when python is run from a console for example?
    Or -- not sure -- within a notebook?

    Let me try to find out. Maybe there is a trick I am missing!

    Thanks,
    Nick

     

    Last edit: Nick Sullivan-Molina 2023-10-27
    • Percy Zahl

      Percy Zahl - 2023-10-27

      May be those calls give some hints -- try that around it:

      PyGILState_STATE py_state = PyGILState_Ensure();
      
      PyErr_SetString(PyExc_KeyboardInterrupt, "Abort");
      PyErr_CheckSignals();
      
      PyGILState_Release (py_state);
      
       

      Last edit: Percy Zahl 2023-10-27
      • Nick Sullivan-Molina

        May be those calls give some hints -- try that around it:

        PyGILState_STATE py_state = PyGILState_Ensure();
        
        PyErr_SetString(PyExc_KeyboardInterrupt, "Abort");
        PyErr_CheckSignals();
        
        PyGILState_Release (py_state);
        

        I am 90% certain I already tried this, before I understood what PyGILState_Ensure()/PyGILState_Release() did, and it did not fix the problem.

        If you recall (from above), these methods are used to tell the Python interpreter that a given thread will be running Python. It does not yet call request the GIL, so it should crash in time.sleep().

        In any case, I will test on Monday to confirm, when I am back in the lab :).

         
        • Percy Zahl

          Percy Zahl - 2023-10-27

          From
          https://docs.python.org/3/c-api/init.html?highlight=pygilstate#c.PyGILState_Check

          sounds to me like this should solve the issue, but yes Idid played with tha tin past myself also... will check again also with my python version. I still have a feeling there is some thing fishy some where... or I still do not get the whole picture right.
          At least I'd say the calls to PyErr_SetString(PyExc_KeyboardInterrupt, "Abort") and
          PyErr_CheckSignals() shall NOT crash/segfault the program in any situation but return an error!

          -- citing --

          PyGILState_STATE PyGILState_Ensure()
          Part of the Stable ABI.
          Ensure that the current thread is ready to call the Python C API regardless of the current state of Python, or of the global interpreter lock. This may be called as many times as desired by a thread as long as each call is matched with a call to PyGILState_Release(). In general, other thread-related APIs may be used between PyGILState_Ensure() and PyGILState_Release() calls as long as the thread state is restored to its previous state before the Release(). For example, normal usage of the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros is acceptable.

          The return value is an opaque “handle” to the thread state when PyGILState_Ensure() was called, and must be passed to PyGILState_Release() to ensure Python is left in the same state. Even though recursive calls are allowed, these handles cannot be shared - each unique call to PyGILState_Ensure() must save the handle for its call to PyGILState_Release().

          When the function returns, the current thread will hold the GIL and be able to call arbitrary Python code. Failure is a fatal error.

          Note Calling this function from a thread when the runtime is finalizing will terminate the thread, even if the thread was not created by Python. You can use _Py_IsFinalizing() or sys.is_finalizing() to check if the interpreter is in process of being finalized before calling this function to avoid unwanted termination.
          void PyGILState_Release(PyGILState_STATE)
          Part of the Stable ABI.
          Release any resources previously acquired. After this call, Python’s state will be the same as it was prior to the corresponding PyGILState_Ensure() call (but generally this state will be unknown to the caller, hence the use of the GILState API).

          Every call to PyGILState_Ensure() must be matched by a call to PyGILState_Release() on the same thread.

           
          • Nick Sullivan-Molina

            Hi Percy,

            Regarding the documentation on PyGILState_Ensure()/PyGILState_Release(): I find it confusing; I may very well be wrong. I am taking my guidance from the section here, where it implies the purpose is to register with the GIL. I think this is reiterated in your quoted text, where they start with: "Ensure that the current thread is ready to call the Python C API regardless of the current state of Python, or of the global interpreter lock". I take this to mean that it is just notifying the Interpreter that C API calls will be run.

            To your point, it could be that I simply missed that, and the SIGSEGVs I was seeing were due to me not registering the C API...

            Again, I can validate Monday morning.

            Thanks,
            Nick

             
            • Percy Zahl

              Percy Zahl - 2023-10-27

              We may have to set this PyConfig.inspect to non zero! To avoid python crashing gxsm on certain execeptions!

              See here:

              int PyRun_SimpleString(const char command)
              This is a simplified interface to PyRun_SimpleStringFlags() below, leaving the PyCompilerFlags
              argument set to NULL.

              int PyRun_SimpleStringFlags(const char command, PyCompilerFlags flags)
              Executes the Python source code from command in the main module according to the flags argument. If main does not already exist, it is created. Returns 0 on success or -1 if an exception was raised. If there was an error, there is no way to get the exception information. For the meaning of flags, see below.

              Note that if an otherwise unhandled SystemExit is raised, this function will not return -1, but exit the process, as long as PyConfig.inspect is zero.

               
              • Percy Zahl

                Percy Zahl - 2023-11-01

                Did you had a look at this above?

                 
                • Nick Sullivan-Molina

                  Hi Percy,

                  Excuse the delay. I had in fact looked at this. When you query PyConfig.inspect before trying to set it, it tells you it is at 0 already.

                  I got a segfault when trying to set it.

                   
    • Nick Sullivan-Molina

      Hi Percy,

      I have a theoretical answer, after pondering a bit.

      How does this ^C abort work correctly when python is run from a console for example?
      Or -- not sure -- within a notebook?

      In standard Python operation, we send an actual signal (e.g., SIGINT), which is handled by the Python interpreter's standard signal handler.

      In our setup, we are 'faking' a signal when we tell Python to do so (via PyErr_SetString() or PyErr_SetInterruption()). The SIGSEGV we receive is because Python's interpreter doesn't allow you to do this while no one has the GIL.

      Your next question might be: "why don't we just send a SIGINT?". Well, by default signals are sent to all threads in a process under POSIX. However, you can explicitly block signal handling for threads, and leave it open for one thread (see here).

      You may recall that I originally mentioned sending an exception to a specific thread (see this Stack Overflow post). When I looked at how we are spawning our thread (i.e., via GTK+), it seemed there was no way to get the thread id to do so.

       
  • Nick Sullivan-Molina

    Hi Percy,

    I forgot to update you on Monday. I tried your two suggestions above, but did not see any change in the issue. I plan to try to wrap the 'initialize()' and 'run()' parts of py_gxsm_console() within the g_thread_new() at some point, but have not gotten around to it yet.

    I will keep you posted. I'm sorry about the delay.

    Thanks,
    Nick

     
  • Nick Sullivan-Molina

    Hi Percy,

    Excuse the delay, I was working on something else and thus took a bit of a break on the gxsm front.

    I spent a bit this week looking into fixing the 'issue', and have a solution I could submit as a pull request later today / early next week.

    What I've done is relatively straightforward:
    * I pulled all the Python interpreter logic into its own thread, so all of py_gxsm_console::initialize() and some of py_gxsm_console::run() are in there.
    * I modified py_gxsm_console::PyRun_GThreadFunc() to be our main Python thread. It calls initialize(), runs the portions of run() that are Python-specific, and then enters a while loop, waiting for commands to be sent. When one is sent, it runs it as it used to. The while loop checks that a new variable, closing, is not true. If true, it ends the interpreter and the thread.
    * In ~py_gxsm_console(), we now set closing to true. We do not call PyFinalize(), this is now called in PyRun_GThreadFunc().
    * In run_command(), I only modified run_data; no thread is summoned (since it's been summoned previously).
    * In pyremote_init(), I spawn my Python thread and start the UI via py_gxsm_console::run().
    * With this done, py_gxsm_console::kill() just calls PyErr_SetInterruptEx(SIGINT). This now works with gxsm.sleep(), time.sleep(), or neither.

    Before I submit the request, I have some questions/would like to confirm that you agree with my plan:
    * There is a lot of unused code in this file; for now, I plan to only make the minimal changes and submit them. I hope that makes sense.
    * Indentation of the file uses both spaces and tabs (4 spaces for tab equivalent), and some sections are over-indented. I suspect this is simply due to submitting on different devices and/or merge weirdnesses. My current plan is to stick to spaces instead of tabs (i.e. 4 spaces per tab), and match the indentation around my changes.
    * I'm a bit unclear on what pyremote_run() does, or when it is run. My assumption is that it might be linked to returning from sleep (meaning we call pyremote_init() on startup and pyremote_cleanup() on close of gxsm, but pyremote_run() if we are returning from a sleep). Is that correct? I want to make sure I do not change it incorrectly and introduce memory leaks or spawning tons of conflicting GThreads.

    Thank you,
    Nick

     
    • Percy Zahl

      Percy Zahl - 2023-11-17

      Hi Nick,
      sounds good! Great job!

      So moving all python init,... code into the same thread helped?
      Do you re-initialize python from scratch now every time for every script run now? I.e. variables won't "stay"? Possibly a good idea to prevent confusion.

      If have to look for the details, but there are two ways to fire up a script:

      a) via the code you see in the console and the "Exec" button

      or

      b) call of the Action-Scripts via Shift-F1...12 (while focus is on any scan window) (and some more dedication task scripts, there will be more in future!) -- NOTE: Those may need to run independent in future -- i.e. should run in parallel to any other script. Currenty prohibited as using the same python engine.

      If I recall right, the pyremote_run() function does that, takes a script file name right?

      AW formatting style, thanks for asking -- a few notes:

      I am exclusively using Emacs in this C++ Mode, kind of adopted classic K&R + GNU (space before brace) style, tabs, = 8 indent -- that is why every file has this in the header on top first line:

      / -- Mode: C++; indent-tabs-mode: nil; c-basic-offset: 8 c-style: "K&R" -- /

      What is a 8 space equivalent indent but using TABs.

      However, very old code or code potentially edited by others using non Emacsen tools may have altered formatting or filled tab w spaces.... :(
      I try to catch that and auto reformat using Emacs...

      And for readability I like to stricktly put a space before braces in all expressions.

      if (x == 13)
               xyzfunc ();
      

      For space saving I am only placing function arguments into following lines when there are many of them, else one line.

      See screenshot attached.

      -P

       

      Last edit: Percy Zahl 2023-11-17
      • Nick Sullivan-Molina

        Hi Percy,

        So moving all python init,... code into the same thread helped?

        Yes, by moving everything into its one thread, we can easily send a SIGINT to either case (time.sleep or gxsm.sleep) using PyErr_SetInterruptEx().

        Do you re-initialize python from scratch now every time for every script run now? I.e. variables won't "stay"? Possibly a good idea to prevent confusion.

        It does not do that. But the existing version does not do that either, right? In fact, I just validated this with my changes reverted.

        If we wanted to reinitialize from scratch, my suggestion would be to spawn each 'script' in a sub-interpreter, as described here:
        https://docs.python.org/3/c-api/init.html#sub-interpreter-support

        (Perhaps there is a simpler approach of simply resetting the 'dictionary' variable you initialize and set up the gxsm extension with?)

        In theory, all that would change would be that we would spawn a new sub-interpreter and run our string within it every time run_data is fed.

        This should be straightforward. However, given all the weirdness of the C-API, my suggestion would be for me to push the initial version first. I don't know how long it will take to validate/fix the sub-interpreter support.

        Do you agree with that?

        If I recall right, the pyremote_run() function does that, takes a script file name right?

        No, pyremote_run() is a static method that is hooked in when the GxsmPlugin is defined (like pyremote_init, pyremote_about, and pyremote_configure. The comment there says "it is called on menupath->'plugin'".

        Any clue how often it would be called, and/or in what scenarios?

        Thanks,
        Nick

         
        • Percy Zahl

          Percy Zahl - 2023-11-17

          Sure, let's move on with all those reworks, not saying at all to use a sub or new interpreter now. I just have that in mind for the future -- may be.
          As those "action scripts" currently are blocked from running when a longer working script is in progress!

          The menupath->... calls are exactly what I mean with "Action Scripts". Those are never called automatically yet. Only if you press one of the "hot keys" Shift-F1",.... from a scan window. Plus You do need to define/create them first, but there are read to go templates.

          So far only will get invoked on user actions, may be on special "buttons" for future auto/user support/AI.... functions!

          Just go (see attached screenshot) to the Remote Console Menu... and open a Action script template, edit to you needs... and save. Then it's ready to be used!

          -P

           
          • Nick Sullivan-Molina

            Hi Percy,

            I just successfully got an action script running, by saving my test script to ~/.gxsm3/pyaction/sf2.py and running from the scan window. A couple of notes:
            1. It does not seem like pyremote_run() is called here. What I see happening is that run_action_script_callback() is hooked into actions in pyremote_init(), and when the button is pressed this is triggered. If that's the case, I think it might be safe to remove pyremote_run and set it to NULL in the GxsmPlugin structure.
            2. Right now, action scripts are not killable! This is because we have two different counters (user_script_running and action_script_running), and kill() only sends the kill signal if script_running > 0. Is this intended? Perhaps there is a separate way to kill an action script.
            3. I should note that both the action script and user script are routed through run_command(). This highlights a big difference in my suggested changes:
            1. in the old code, each time run_command() is run, a new thread is spawned, where Python is supposed to run. Thus, you could theoretically run multiple Python scripts at once.
            2. In my changes, all Python scripts are running in 1 thread. Thus, I think my suggested changes break the ability to run an action script and a user script simultaneously.

            Is the expectation to be able to run the two scripts simultaneously? I may need to return to think of this more...

            Thanks,
            Nick

             
            • Percy Zahl

              Percy Zahl - 2023-11-17

              Good.

              To put it this way:
              I never had any intention for "action scripts" doing anything long or infinite or blocking...
              They are intended so far for automating brief tasks or a do a set of multiple adjustments.

              The simplest example is putting bias and current to some defaults as you see.
              I have more advanced actions taking a few sec may be as they do some measurements/averaging and then do some thing related...

              Or transition between AFM in const height mode and STM in a safe and automatic way.

              The "kill" button should be only acting on the the script started from the console window. I think this makes sense.

              About that unused function, let me have a look what it is about... can well be a historic relict or a version prior threadding, etc. I left behind...

              Just enclose it with

              #if 0
              ...
              #endif
              

              and see if it is been missed some where...

              -P

               
            • Percy Zahl

              Percy Zahl - 2023-11-19
              1. In my changes, all Python scripts are running in 1 thread. Thus, I think my suggested changes break the ability to run an action script and a user script simultaneously.

              I suggest to keep the thread spawning for the "action scripts" and only use one dedicated thread for the console controlled script where the "kill" may be issued.

              BTW, you may create a copy of the github gxsm3 repo, include you changes and create a merge request.
              Or alternatively if you like, I could add you (you have a github user name?) to the project.

              -P

               
1 2 > >> (Page 1 of 2)

Log in to post a comment.