Menu

QP application hits assertions while initializing an Active Object

2024-06-26
2024-07-11
  • M Suleman Khalid

    Hi. I'm facing a problem while running my QP application while running on ARM microcontroller (the code works fine while running on Windows). The execution trace shows that while the only global Active Object is being initialized, the initialization hits the QF_crit_entry_() followed by Q_onError() and the whole thing halts.

    Execution trace is attached.

    Any idea why this could be happening?

     

    Last edit: M Suleman Khalid 2024-06-26
  • Quantum Leaps

    Quantum Leaps - 2024-06-26

    Hi M,
    Your code is causing an assertion failure inside the QP Framework. The most important information is which assertion failed, which is provided in the module and id parameters to the Q_onError() function . So, please just print this information.

    On a side note, it seems that you've implemented your own software tracing system based on 'printf'. I'm sure that it was quite a bit of work. But a much more powerful and efficient software tracing system, called QP/Spy, is already provided in the QP Framework. The system is not that hard to use. Please just watch a couple of minutes in the "Getting Started with QP/QM" video.

    --MMS

     
    • M Suleman Khalid

      Hi, MMS.

      Thanks for the reply. However, as this failure occurs even before the debug console has been initialized (ie it occurs at the very start when libc is just initializing the global/static objects), I'm not sure how to print it at that time! Let me see if I can inspect those parameters by attaching a debugger.

      BTW, I did not implement my own software tracing system. In fact, I'm using the tracing provided by the ARM emulator that I'm using. And I also regularly use QP/Spy on windows. It's just that I didn't look into using QSpy on the emulator through an emulated serial port.

      Regards.

       

      Last edit: M Suleman Khalid 2024-06-27
      • Quantum Leaps

        Quantum Leaps - 2024-06-27

        Yes, please use the debugger to find out which assertion is failing. For this, set a breakpoint at Q_onError. There is one assertion in the QTimeEvt::QTimeEvt() constructor (file qf_time.cpp), which might be firing. Most likely you have either incorrect signal for the time event, or an incorrect tick rate. Please check.
        --MMS

         
        • M Suleman Khalid

          Hi,

          I attached the debugger and set it up to break at the entry of Q_onError(). This is what I got!

          Breakpoint 1, Q_onError (module=0x6b70 <QF_port_module_> "qv_port", id=110)
          

          After looking at the code for critical section entry QF_crit_entry_() in the qv_port.cpp, it seems that the the assertion assert(PRIMASK/BASEPRI == 0) is failing.

              "  CMP     r0,#0            \n" // assert(PRIMASK/BASEPRI == 0)
              "  BNE     QF_crit_entry_error\n"
              "  BX      lr               \n"
              "QF_crit_entry_error:       \n"
              "  LDR     r0,=QF_port_module_ \n"
              "  MOVS    r1,#110          \n"
              "  LDR     r2,=Q_onError    \n"
              "  BX      r2               \n"
          

          The comment on top of the function says that:

          // NOTE:
          // The assertion means that this critical section CANNOT nest.
          

          And I'm not sure where in code I'm "nesting" any critical section!

          The debug backtrace shows the following sequence of calls:

          #0  Q_onError (module=0x6b70 <QF_port_module_> "qv_port", id=110)
              at F:/workspaces/myapp/src/bsp/ke02z/qv/bsp.cpp:99
          #1  0x000024f2 in QP::QTimeEvt::QTimeEvt (this=0x20000308 <AOs::MyApp::instance+40>,
              act=0x200002e0 <AOs::MyApp::instance>, sig=4, tickRate=0) at ../qpcpp/src/qf/qf_time.cpp:85
          #2  0x00001026 in AOs::MyApp::myapp (this=0x200002e0 <AOs::MyApp::instance>)
              at F:/workspaces/myapp/src/myapp.cpp:56
          #3  0x000019ac in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
              at F:/workspaces/myapp/src/myapp.cpp:51
          #4  0x000019ca in _GLOBAL__sub_I__ZN3AOs6MyApp8instanceE () at F:/workspaces/myapp/src/myapp.cpp:728
          #5  0x000046a8 in __libc_init_array ()
          #6  0x00000102 in ResetISR () at ../startup/startup_mke02z4.cpp:340
          

          It turns out the sig=4 corresponds the Q_USER_SIG, which is first timeout signal being initialized (in the initializer list) of the active object constructor as m_motorOnTimeoutEvt(this, MOTOR_ON_TO_SIG, 0U),.

          The event object is defined as following in the Active Object class:

          private:
          QP::QTimeEvt m_motorOnTimeoutEvt;
          

          Off-course the signal is defined in the application as an enum value of MOTOR_ON_TO_SIG = QP::Q_USER_SIG,.

          What could I be doing wrong?

           

          Last edit: M Suleman Khalid 2024-06-28
  • Quantum Leaps

    Quantum Leaps - 2024-06-28

    As you found out, you're hitting assertion qv_port:110. The reason for this assertion is explained in the soruce code: "The assertion means that this critical section CANNOT nest." These assertions (qv_port:110/111) have been introduced after the QP/C++ has been scruitinized for strct balancing of critical sections. Since all critical sections now must be balanced, the nesting of critical section indicates an error. I hope you see the point.

    So, apparently, your code enters a critical section or just disables interrupts prior to calling the QTimeEvt::QTimeEvt() constructor. The remendy is easy: you need to find the place in your startup sequence where the critical section is entered but not exited. I would just setup your debugger to start from the reset handerl (as opposed to going all the way to main()) and keep stepping through the startup code watching the BASEPRI register. BASEPRI is 0 out of reset, so you should clearly see where it gets set to non-zero value.

    Please report to this forum what you find out.

    --MMS

     

    Last edit: Quantum Leaps 2024-06-28
    • M Suleman Khalid

      Hi MMS,

      As suggested by you, I've ran the GDB debug session again while monitoring the value of the BASEPRI register all the way (using info registers basepri GDB command). It apears that the BASEPRI register always stays zero, but the program still hits the assertion failure and Q_onError is thus called. Here is my debug session:

      GNU gdb (Arm GNU Toolchain 12.3.Rel1 (Build arm-12.35)) 13.2.90.20230627-git
      Copyright (C) 2023 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
      Type "show copying" and "show warranty" for details.
      This GDB was configured as "--host=i686-w64-mingw32 --target=arm-none-eabi".
      Type "show configuration" for configuration details.
      For bug reporting instructions, please see:
      <https://bugs.linaro.org/>.
      Find the GDB manual and other documentation resources online at:
          <http://www.gnu.org/software/gdb/documentation/>.
      
      For help, type "help".
      Type "apropos word" to search for commands related to "word"...
      Reading symbols from .\myapp.axf...
      (gdb) target remote :3333
      Remote debugging using :3333
      ResetISR () at ../startup/startup_mke02z4.cpp:275
      275         __asm volatile ("cpsid i");
      (gdb) break MyApp::MyApp
      warning: could not convert 'MyApp' from the host encoding (CP1252) to UTF-32.
      This normally should not happen, please file a bug report.
      Breakpoint 1 at 0x1004: file F:/workspaces/myapp/src/myapp.cpp, line 71.
      (gdb) break QF_crit_entry_
      Breakpoint 2 at 0x2654: file ../qpcpp/ports/arm-cm/qv/gnu/qv_port.cpp, line 141.
      (gdb) break Q_onError
      Breakpoint 3 at 0x5630: file F:/workspaces/myapp/src/bsp/ke02z/qv/bsp.cpp, line 99.
      (gdb) continue
      Continuing.
      
      Breakpoint 1, AOs::MyApp::MyApp (this=0x200002e0 <AOs::MyApp::instance>)
          at F:/workspaces/myapp/src/myapp.cpp:71
      71          m_motorOffEvt(MOTOR_OFF_SIG)
      (gdb) info registers basepri
      basepri        0x0                 0
      (gdb) s
      QP::QActive::QActive (this=0x200002e0 <AOs::MyApp::instance>,
          initial=0x5863 <AOs::MyApp::initial(void*, QP::QEvt const*)>)
          at F:\workspaces\myapp\proj\mcux\myapp\qpcpp\include/qp.hpp:777
      777             m_pthre(0U)
      (gdb) s
      QP::QAsm::QAsm (this=0x200002e0 <AOs::MyApp::instance>)
          at F:\workspaces\myapp\proj\mcux\myapp\qpcpp\include/qp.hpp:276
      276             m_temp ()
      (gdb) s
      275           : m_state(),
      (gdb) s
      QP::QAsmAttr::QAsmAttr (this=0x200002e4 <AOs::MyApp::instance+4>)
          at F:\workspaces\myapp\proj\mcux\myapp\qpcpp\include/qp.hpp:221
      221         constexpr QAsmAttr() : fun(nullptr) {}
      (gdb) s
      QP::QAsm::QAsm (this=0x200002e0 <AOs::MyApp::instance>)
          at F:\workspaces\myapp\proj\mcux\myapp\qpcpp\include/qp.hpp:276
      276             m_temp ()
      (gdb) s
      QP::QAsmAttr::QAsmAttr (this=0x200002e8 <AOs::MyApp::instance+8>)
          at F:\workspaces\myapp\proj\mcux\myapp\qpcpp\include/qp.hpp:221
      221         constexpr QAsmAttr() : fun(nullptr) {}
      (gdb) s
      QP::QAsm::QAsm (this=0x200002e0 <AOs::MyApp::instance>)
          at F:\workspaces\myapp\proj\mcux\myapp\qpcpp\include/qp.hpp:277
      277         {}
      (gdb) s
      QP::QActive::QActive (this=0x200002e0 <AOs::MyApp::instance>,
          initial=0x5863 <AOs::MyApp::initial(void*, QP::QEvt const*)>)
          at F:\workspaces\myapp\proj\mcux\myapp\qpcpp\include/qp.hpp:776
      776             m_prio(0U),
      (gdb) s
      777             m_pthre(0U)
      (gdb) s
      QP::QEQueue::QEQueue (this=0x200002f4 <AOs::MyApp::instance+20>)
          at F:\workspaces\myapp\proj\mcux\myapp\qpcpp\include/qequeue.hpp:88
      88            : m_frontEvt(nullptr),
      (gdb) info registers basepri
      basepri        0x0                 0
      (gdb) s
      89              m_ring(nullptr),
      (gdb) s
      90              m_end(0U),
      (gdb) s
      91              m_head(0U),
      (gdb) s
      92              m_tail(0U),
      (gdb) s
      93              m_nFree(0U),
      (gdb) s
      94              m_nMin(0U)
      (gdb) s
      95          {}
      (gdb) s
      QP::QActive::QActive (this=0x200002e0 <AOs::MyApp::instance>,
          initial=0x5863 <AOs::MyApp::initial(void*, QP::QEvt const*)>)
          at F:\workspaces\myapp\proj\mcux\myapp\qpcpp\include/qp.hpp:779
      779             m_state.fun = Q_STATE_CAST(&top);
      (gdb) info registers basepri
      basepri        0x0                 0
      (gdb) s
      780             m_temp.fun  = initial;
      (gdb) s
      783             m_prio_dis  = static_cast<std::uint8_t>(~m_prio);
      (gdb) s
      784             m_pthre_dis = static_cast<std::uint8_t>(~m_pthre);
      (gdb) s
      786         }
      (gdb) s
      AOs::MyApp::MyApp (this=0x200002e0 <AOs::MyApp::instance>)
          at F:/workspaces/myapp/src/myapp.cpp:56
      56          m_motorOnTimeoutEvt(this, MOTOR_ON_TO_SIG, 0U),
      (gdb) info registers basepri
      basepri        0x0                 0
      (gdb) s
      QP::QTimeEvt::QTimeEvt (this=0x20000308 <AOs::MyApp::instance+40>,
          act=0x200002e0 <AOs::MyApp::instance>, sig=4, tickRate=0) at ../qpcpp/src/qf/qf_time.cpp:82
      82          m_interval(0U)
      (gdb) info registers basepri
      basepri        0x0                 0
      (gdb) s
      QP::QEvt::QEvt (this=0x20000308 <AOs::MyApp::instance+40>, s=4)
          at F:\workspaces\myapp\proj\mcux\myapp\qpcpp\include/qp.hpp:160
      160           : sig(s),
      (gdb) info registers basepri
      basepri        0x0                 0
      (gdb) s
      161             refCtr_(0U),
      (gdb) s
      162             evtTag_(MARKER)
      (gdb) s
      163         {}
      (gdb) s
      QP::QTimeEvt::QTimeEvt (this=0x20000308 <AOs::MyApp::instance+40>,
          act=0x200002e0 <AOs::MyApp::instance>, sig=4, tickRate=0) at ../qpcpp/src/qf/qf_time.cpp:79
      79          m_next(nullptr),
      (gdb) info registers basepri
      basepri        0x0                 0
      (gdb) s
      80          m_act(act),
      (gdb) s
      81          m_ctr(0U),
      (gdb) s
      82          m_interval(0U)
      (gdb) s
      85          QF_CRIT_ENTRY();
      (gdb) info registers basepri
      basepri        0x0                 0
      (gdb) s
      
      Breakpoint 2, QF_crit_entry_ () at ../qpcpp/ports/arm-cm/qv/gnu/qv_port.cpp:141
      141     __asm volatile (
      (gdb) info registers basepri
      basepri        0x0                 0
      (gdb) s
      Q_onError (module=0x20000308 <AOs::MyApp::instance+40> "\004", id=265128)
          at F:/workspaces/myapp/src/bsp/ke02z/qv/bsp.cpp:90
      90      Q_NORETURN Q_onError(char const * const module, int_t const id) {
      (gdb) info registers basepri
      basepri        0x0                 0
      (gdb) s
      
      Breakpoint 3, Q_onError (module=0x6b70 <QF_port_module_> "qv_port", id=110)
          at F:/workspaces/myapp/src/bsp/ke02z/qv/bsp.cpp:99
      99          for (;;) {
      (gdb)
      

      Any suggestions?

      Regards.

       
  • Quantum Leaps

    Quantum Leaps - 2024-07-01

    Hi M,
    From a quick scan of your GDB session, it seems to me that your CPU is using the PRIMASK register, not BASEPRI. I think this is the case because I see the line (which sets PRIMASK):

    275         __asm volatile ("cpsid i");
    

    You don't bother mentioning which Cortex-M CPU you're using, but it would be helpful to know that you're running Cortex-M0 or M0+ (ARMv6-M architectures). Your GDB output might be misleading because ARMv6-M does not even have the BASEPRI register.

    So, plese repeat the steps, but monitor PRIMASK.

    Finally, I would really highly recommend that you do yourself a favor and get a real debugger. It's no longer the 1990's to work with GDB in command-line. With a real debugger, the whole session should take 30 seconds, not a week that it's taking already. I hope my comments make sense to you.

    --MMS

     

    Last edit: Quantum Leaps 2024-07-02
  • M Suleman Khalid

    Yes I'm using Cortex M0+, sorry forgot to mention it earlier. Also I'm using QP version 7.3.2, if it matters.

    And yes, I do use a graphical debugger, however it's easier to share text-based session of GDB.

    Thanks for the support. Will get back to you after inspecting PRIMASK.

     

    Last edit: M Suleman Khalid 2024-07-02
    • Quantum Leaps

      Quantum Leaps - 2024-07-02

      Any particular reason why you are not using the latest QP (currently 7.3.4)?
      --MMS

       
  • M Suleman Khalid

    Hi again! So here is the situation, the PRIMASK is set write at the time the system boots up (cpsid i) till the time the active object constructor enters. That is why, when a timeout object is initialized by the constructor, and the critical section enters, the assertion fails. Not sure, where to go from here!

    Secondly about not using the latest QP version. Actually, I started work on the project a little while back with the then latest version ie 7.3.2. Additionally, I'm using QS/QSpy as well as QView to monitor the state of the target on the GUI. At that time, I was facing some problem getting QView to run properly. The problem was that as QView internally calls the Python interpreter, some of my python files were not being loaded/found the python interpreter (being called internally) unless I explicitly included their import statements in the qview.py file, which meant modifying the qview.py file provided with the QP framework. There were some other changes also required to be done in qview.py in order to use my included files. So, I just made a copy of the qview.py made the required changes so that it worked. Upgrading the QP version would mean redoing all the changes I did in previous version of QView, so I thought I can defer it for later.

    I know, not an elegant solution, nor what I would've liked to do. However, the hack works for now and for that reason I did not yet upgrade the QP version. If required I can explain the whole situation in a new issue on this forum for your further perusal.

    Regards.

     

    Last edit: M Suleman Khalid 2024-07-03
  • Quantum Leaps

    Quantum Leaps - 2024-07-03

    OK, so you've found the root cause of the problem. The simplest fix would be to just delete the cpsid i instruction from the Reset_Handler in the startup code. Most likely, you should modify the standard startup code anyway because it typically has endless loops (denial of service) hard-coded in the exception handlers. So, removing the cpsid i instruction shouldn't be such a big deal.

    Regarding QView, I see your point. The current QView design is similar to QUTest in that the customization script is launched in the separate instance of the Python interpreter. This is needed in QUTest for test scripts because QUTest executes many test scripts in one run and it needs continuity. However, QView has only one customization and it could be designed to be imported into the single customization. This would invert the control, but it would allow the customizations to import any other Python modules as well (as you were trying to do). I'll look into re-designing QView along these lines.

    --MMS

     

    Last edit: Quantum Leaps 2024-07-03
    • M Suleman Khalid

      Yeah, I think I'll enable the interrupts before the C library enters the initialization code. However, the interrupt must've been disabled in the startup code for a reason. I'm not sure what can go wrong by doing so.

      Regarding QView, it would be great if you review the QView design. I think it can be a very useful and powerful tool to be able to develope a comprehensive GUI using QView to monitor the target state remotely.

      Regards.

       
  • Quantum Leaps

    Quantum Leaps - 2024-07-05

    Absolutely, interrupts must NOT fire during the whole startup sequence and the QP Framework is NOT ready to receive them until QF::run(). But interrupts won't fire unless they are configured and explicitly enabled in the NVIC. You don't need to disable them with PRIMASK. I hope you see my point.

    Regarding the re-design of QView, it's done alredy. Please take a look at the latest release:
    https://www.state-machine.com/qtools/history.html#qtools_7_4_1.

    I'll make a separate post about the new QView in this forum.

    --MMS

     
  • Harry Rostovtsev

    So I think I'm running into the same problem as mentioned here. I'm attempting to upgrade from QP 6.9.2 to 7.3.4 and I think there may have been some changes that are breaking me. Specifically, I was encountering an assertion in qf_time, loc=100 but it was actually causing a hardfault. The hardfault was being caused because I was nesting critical sections. The old QS_onFlush code would disable interrupts before grabbing bytes (as did all the examples).

    void QS_onFlush(void)
    {
        uint16_t b;
    
    //    QF_INT_DISABLE(); <-- THIS WAS CAUSING ISSUES UNTIL I COMMENTED IT OUT
        while ((b = QS_getByte()) != QS_EOD) { /* while not End-Of-Data... */
    //        QF_INT_ENABLE();
            /* while TXE not empty */
            while(!(UART_SPY_BASE_PORT->STAT & LPUART_STAT_TDRE_MASK)) {}
            UART_SPY_BASE_PORT->DATA = (b & 0xFFU);
    //        QF_INT_DISABLE();
        }
    //    QF_INT_ENABLE();
    }
    

    The assertion would happen correctly afterwards without a hardfault when I noticed that the new examples don't seem to have that crit entry/exit section. I commented mine out and bam, hardfault went away.

    In general, this seems like a nice new addition. That said, I guess my question is where can I see the changes that I would need to make to the rest of the code to avoid these issues. I attempted to search the Revision History but nothing immediately jumped out.

    The second (and admittedly unrelated) part of my question is why the assertion happens to begin with (it didn't in the old version of QP). I can cause it to happen by sending in "u" via qspy, which in turn causes this line in qf_time.c to die:

        Q_REQUIRE_INCRIT(100, tickRate < Q_DIM(QTimeEvt_timeEvtHead_));
    

    since the passed in tickRate is 1 and this assertion fails. This clearly has something to do with multiple tickrates (which I've never used) but it didn't use to cause an assertion. I mainly used "t" and "u" via qspy to see if things were still alive in some debugging situations.

    Edit:
    I should note, the example I'm looking at is arm-cm/game_efm32 and in bsp.c, the function QK_onIdle seems to protect QS_getByte() with crit entry/exit, but the QS_onFlush function does not do this. This seems inconsistent...

    void QK_onIdle(void) {
        ...
        QS_rxParse();  // parse all the received bytes
    
        if ((l_USART0->STATUS & USART_STATUS_TXBL) != 0) {  // is TXE empty?
            uint16_t b;
    
            QF_INT_DISABLE();
            b = QS_getByte();
            QF_INT_ENABLE();
    
            if (b != QS_EOD) {  // not End-Of-Data?
                l_USART0->TXDATA = (b & 0xFFU);  // put into the DR register
            }
        }
    ...
    }
    
    //............................................................................
    // NOTE:
    // No critical section in QS_onFlush() to avoid nesting of critical sections
    // in case QS_onFlush() is called from Q_onError().
    void QS_onFlush(void) {
        for (;;) {
            uint16_t b = QS_getByte();
            if (b != QS_EOD) {
                while ((l_USART0->STATUS & USART_STATUS_TXBL) == 0U) {
                }
                l_USART0->TXDATA  = b;  // put into the DR register
            }
            else {
                break;
            }
        }
    }
    

    Though, there is a nice note above the QS_onFlush() function to explain it, don't we need to protect the QS buffer when not called from Q_onError?

     

    Last edit: Harry Rostovtsev 2024-07-11
  • Quantum Leaps

    Quantum Leaps - 2024-07-11

    Hi Harry,
    All the recent changes to the QP framework are related to the functional safety certification. One of the obvious concerns of every safety standard is the error handling policy, which in QP is centered around the "assertion programming" and "failure assertion programming" (terms from IEC 61508). In case of an assertion failure, the system is already considered unsafe, and so it is NOT allowed to service interrupts, do context switches, etc. The only actions allowed to perform in the error handler (Q_onError() in the newer QP frameworks) are to put the system in a "safe mode" (whatever that means for a given system), and most often do the reset. I hope all this makes sense to you.

    Consequently, if you think about it, the assertion check must also happen in a critical section. And so, the assertion macros (Q_ASSERT.., Q_REQUIRE... etc.) now run within critical section.

    So, this is one part of your question/problem. I sincerely apologize, if this new error handling policy is not completely backward-compatible, but I don't see a clear way around it.

    Now, regarding the new assertions against nesting of critical sections -- this is also related to safety certification. Specifically, the two identified hazards within QP framework are: exiting ciritical section prematurely and leaving the critical section in force too long. Both hazards are mitigated by strictly enforcing non-nesting of critical sections.

    Finally, regarding QS software tracing, the QS_onFlush() callback should run only during the initial transient, where interrupts are not configured and not started yet. (This happens later, in the QF_onStartup() callback, please see the documentation for QP Framework startup sequence). In other words, the QS_onFlush() callback should not need any critical section.

    I really hope that my explanations make sense to you and that now, that you know why this is done that way, you'll adjust your code to comply. I'm sure this will improve the overall quality.

    --MMS

     

    Last edit: Quantum Leaps 2024-07-11
    • Harry Rostovtsev

      Yeah, the explanation makes sense. I actually welcome the change, especially if I get safety certs as part of the bargain.

      So, this is one part of your question/problem. I sincerely apologize, if this new error handling policy is not completely backward-compatible, but I don't see a clear way around it.

      Yeah, totally understandable. I was just hoping for a quick way to scan my code to fix any potential mines left behind from the previous version. The startup sequence diagram helps. Any other resources I should check?

       

Log in to post a comment.