Menu

Bug in QK_activate_???

2016-12-03
2017-01-19
  • Craig Broadbooks

    I am using QP/C++ version 5.8, running on an STM32F4 processor on a custom board. This board stores files on an sdcard and it also has a Bluetooth classic module and we are able to connect to the board via Bluetooth and transfer files back to a client device (either Mac or Android tablet). While transfering files, I will almost always have one of two issues occur within the download session that cause an assertion failure within QK_activate in the qk.cpp module.

    I've included the complete QK_activate function for reference at the bottom of this post with some comments where I have concerns. The most notable issue shows up in the following section of code:

            QP::QEvt const *e = a->get_();
            a->dispatch(e);
            QP::QF::gc(e);
    

    There are times when this code is executed when the queue is empty. In fact if I put an test if statement before this code which calls a->m_eQueue.isEmpty(), it will catch this condition at some point during my file transfer session.

    The other issue is that the following line occassionally causes an assertion failure. Bases on the comments I see in the code, QK_activate should not called if p (which comes from QKattr.nextPrio) equals 0.

        Q_REQUIRE_ID(800, p != static_cast<uint_fast8_t>(0));
    

    We only started having this issue after migrating our project to QP/C++ from QP/C recently. QP/C worked very well.

    Please let me know if you have any ideas or suggestions. If you need any more information or question, please let me know.

    Thanks,
    Craig

    void QK_activate_(void) {
        uint_fast8_t pin = QK_attr_.actPrio; // save the active priority
        uint_fast8_t p   = QK_attr_.nextPrio; /* the next prio to run */
        QP::QActive *a;
    
        // QS tracing or thread-local storage?
    #ifdef Q_SPY
        uint_fast8_t pprev = pin;
    #endif // Q_SPY
    
        //This line occasionally asserts.
        // QK_attr_.nextPrio must be non-zero upon entry to QK_activate_()
        Q_REQUIRE_ID(800, p != static_cast<uint_fast8_t>(0));
    
        QK_attr_.nextPrio = static_cast<uint_fast8_t>(0); // clear for next time
    
        // loop until no more ready-to-run AOs of higher prio than the initial
        do {
            a = QP::QF::active_[p]; // obtain the pointer to the AO
            QK_attr_.actPrio = p; // this becomes the active priority
    
            QS_BEGIN_NOCRIT_(QP::QS_SCHED_NEXT, QP::QS::priv_.aoObjFilter, a)
                QS_TIME_();   // timestamp
                QS_2U8_(static_cast<uint8_t>(p), // prio of the scheduled AO
                        static_cast<uint8_t>(pprev)); // previous priority
            QS_END_NOCRIT_()
    
    #ifdef Q_SPY
            if (p != pprev) { // changing priorities?
                pprev = p;    // update previous priority
            }
    #endif // Q_SPY
    
            QF_INT_ENABLE();  // unconditionally enable interrupts
    
            // perform the run-to-completion (RTS) step...
            // 1. retrieve the event from the AO's event queue, which by this
            //    time must be non-empty and QActive_get_() asserts it.
            // 2. dispatch the event to the AO's state machine.
            // 3. determine if event is garbage and collect it if so
            //
    
            //There are times when the queue is empty, yet this code
            //tries to retrieve an event from the queue an dispatch it.
            QP::QEvt const *e = a->get_();
            a->dispatch(e);
            QP::QF::gc(e);
    
            // determine the next highest-priority AO ready to run...
            QF_INT_DISABLE();
    
            if (a->m_eQueue.isEmpty()) { // empty queue?
                QK_attr_.readySet.remove(p);
            }
    
            // find new highest-prio AO ready to run...
            p = QK_attr_.readySet.findMax();
    
            // is the new priority below the initial preemption threshold?
            if (p <= pin) {
                p = static_cast<uint_fast8_t>(0); // active object not eligible
            }
            else if (p <= QK_attr_.lockPrio) { // is it below the lock prio?
                p = static_cast<uint_fast8_t>(0); // active object not eligible
            }
            else {
                Q_ASSERT_ID(710, p <= static_cast<uint_fast8_t>(QF_MAX_ACTIVE));
            }
        } while (p != static_cast<uint_fast8_t>(0));
    
        QK_attr_.actPrio = pin; // restore the active priority
    
    #ifdef Q_SPY
        if (pin != static_cast<uint_fast8_t>(0)) { // resuming an active object?
            a = QP::QF::active_[pin]; // the pointer to the preempted AO
    
            QS_BEGIN_NOCRIT_(QP::QS_SCHED_RESUME, QP::QS::priv_.aoObjFilter, a)
                QS_TIME_();  // timestamp
                QS_2U8_(static_cast<uint8_t>(pin), // prio of the resumed AO
                        static_cast<uint8_t>(pprev)); // previous priority
            QS_END_NOCRIT_()
        }
        else {  // resuming priority==0 --> idle
            QS_BEGIN_NOCRIT_(QP::QS_SCHED_IDLE,
                             static_cast<void *>(0), static_cast<void *>(0))
                QS_TIME_();  // timestamp
                QS_U8_(static_cast<uint8_t>(pprev)); // previous priority
            QS_END_NOCRIT_()
        }
    #endif // Q_SPY
    }
    
     
  • Quantum Leaps

    Quantum Leaps - 2016-12-03

    First, please check if you have adequate stack space and you don't simply experience stack overflow. Please rememer that QK is a preemptive kernel, so it needs stack for nesting QK_activate_() calls. Also, you are using Cortex-M4F with FPU enabled most likely, so the stack frame is significantly bigger.

    Second, please check that all your interrupt priorities are explicitly set to numerical values higher than the QF_AWARE_ISR_CMSIS_PRI level, as described in the AppNote "Setting ARM Cortex-M Interrupt Priorities in QP 5.x", by using the CMSIS function NVIC_SetPriority().

    Frankly, my gut feeling is that you have some interrupts running at priority zero (the highest urgency), which is the default out of reset. Such interrupts are never disabled by the QP critical sections, and therefore they are not allowed to make any calls to QP (they are called "kernel-unaware" interrupts). The assertions you are breaking in QP are consistent with corruption of the internal QK variables, which would happen if critical sections are violated.

    Please note that any 3rd-party code (such as Blootooth libraries) might be also messing with your interrupt priorities.

    Therefore, please make sure that all interrupt priorities are ultimately set in the NVIC to a non-zero value. You can do this by inspecting the NVIC registers in your debugger. I attach a screen shot from the IAR debugger (other debuggers should offer a similar capability). The screen shot corresponds to the DPP example for EK-TM4C123GXL, right after the call to NVIC_SetPriority(GPIOA_IRQn, GPIOA_PRIO).

    Third, please verify that the interrupt priorities of system exceptions, like SysTick and PendSV are also set correctly (again Blootooth libraries might mess with these). These priorities are set in the SBC (System Control Block). Again I attach a screen shot from the debugger, which shows that SysTick (PRI_15) is set to 0x40, while PendSV (PRI_14) is set to 0xE0. These values are for NVIC with 3 prirority levels. STM32 has NVIC with 4 priority levels, so PRI_14 should be set to 0xF0.

    Forth, make sure that the interrupt pririty grouping is not set (which some STM32 libraries like to do). You can do it by calling NVIC_SetPriorityGrouping(0U) after all libary initialization is done.

    Finally, please make sure you are not using old CMSIS 3.20, because that one had a bug in NVIC_SetPriority()

    Please make a post to this forum what you find out.

    --MMS

     

    Last edit: Quantum Leaps 2016-12-03
  • Craig Broadbooks

    Sorry for the delay. First off, let me appologize for my title of the post: "Bug in QK_activate???". It unfairly indicates that there may be a bug in the framework. In my case there was no bug in the framework. Please accept my appologies.

    My issue was caused by an incorrect priority being set for PendSV. I use a tool called STM32CubeMX to generate my hardware configuration (for an STM32F4 chip) and since I don't directly use PendSV I did not notice that STM32CubeMX was setting it to 0. Additionally, I was not aware that PendSV priority should be set to the lowest available to prevent it from preempting any other ISRs.

    I would recommend anyone struggling with the interrupt priorities or interested in getting a better understanding on how they should be configured for QP, please read: http://www.state-machine.com/doc/AN_ARM-Cortex-M_Interrupt-Priorities.pdf.

    Thanks,
    Craig

     

    Last edit: Craig Broadbooks 2017-01-24
  • Quantum Leaps

    Quantum Leaps - 2017-01-19

    You bring up a very important point, which is the potential interference between the QP framework and any third-party software, such as STM32Cube. The problem in this case was that QP initialized the PendSV priority to the lowest urgency (0xFF), but it was subsequently overwritten by STM32Cube.

    To avoid such interference, it is critical to understand the timeline of initialiations performed in the QP framework. This initialization proceeds in the following stages:

    1. QF_init() initializes the framework and the underling kernel (such as QK). In the Cortex-M port, the PendSV priority is set to 0xFF at this point.
    2. QF_psInit() - initialize publish-suscribe (if used)
    3. QF_poolInit() - initize event-pool (each pool requires a call to QF_poolInit())
    4. QACTIVE_START() - start active objec (each acive object requires a call to QACTIVE_START())
    5. QF_run() - run the framework. Calls QF_onStartup() to configure and start interrupts.

    Apparently, you must have called STM32Cube initialization after QF_init(), so perhaps you need to reverse the order and call QF_init() after STM32Cube (but before any other QF initialization calls).

    --MMS

     

    Last edit: Quantum Leaps 2017-01-19
  • Craig Broadbooks

    In my case, I was addressing the issue still using 5.8.0 before the new code in 5.8.1 would set all these things. STM32CubeMX likes to initialize all the interrupt priorities before any 3rd party RTOS or framework. So I simply had never explicitly set the priority for PendSV and it was at the value STM32CubeMX set it, which was 0. I now have QF_startup called after all the STM32CubeMX initialzation code executes and I explicitly set all the priorities. That said, I am in the process of updating to 5.8.1 and as you've pointed out, the QF_init call should ensure priorities are not left uninitialized to a problematic value. I'll make sure it is still called after the STM32CubeMX init code. I'll let everyone know how it went.

     

    Last edit: Craig Broadbooks 2017-01-24
  • Quantum Leaps

    Quantum Leaps - 2017-01-25

    It is highly recommended to independently verify the setting of interrupt/exception priorities in your Cortex-M by inspecting the NVIC and SBC in the debugger.

    The test can (should) be performed as follows:

    1. start the debugger and load the code to your target board
    2. run the code freely for a while
    3. break into the running code (hit break in the debugger)
    4. inspect the NVIC and SBC registers in the debugger. Please see the screen shots (NVIC.jpg and SBC.jpg) from my earlier post.

    The test will help you to ensure that the settings were not inadvertently changed by some software on your target.

    --MMS

     

Log in to post a comment.