QP Real-Time Embedded Frameworks & Tools / Bugs / #128 QK Q_ASSERT_ID(0,...) from QACTIVE_EQUEUE_WAIT_ in QActive_get

Quantum Leaps - 2016-04-14

Thank you for taking the time to report an issue. This is always highly appreciated.

The first thing I would highly recommend is to look at your interrupt priority settings. The more recent QP ports to Cortex-M3/M4 use the "selective interrupt disabling" policy, which leaves highest-priority interrupts not disabled at all. Such interrupts are called "kernel unaware" and should never call QP services, as this would lead to corruption of internal data. (Your assertion is indicative of exactly such a situation).

There is a special App Note "Setting ARM Cortex-M Priorities for QP 5.1 and Higher", which I highly recommend for you to read.

Please make a post to this bug report if setting the interrupt priorities helps you.

--MMS

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Kirk Wolff - 2016-04-14

Thank you for your pointer. Associated changes seemed to have improved the issue on the M3 processor, however the problem seems to be worse on the M0. I just corrected a bug where the PendSV and SYSTICK interrupts were both set to 192 (lowest), and now I've changed the SYSTICK to 128, and the problem seems to have been exacerbated. Do you have any suggestion about this same assertion on the M0, given that the M0 doesn't have a BASEPRI register?

Last edit: Kirk Wolff 2016-04-14

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2016-04-14

Please do not change the priority of the PendSV exception in the NVIC. This priority is set to 0xFF (the lowest urgency) in QK_init() and must remain the lowest urgency. All this is described in the aforementioned AppNote "Setting ARM Cortex-M Interrupt Priorities...".

To debug the problem on the M0, please make sure that the QF critical section actually works by inspecting the PRIMASK register while the code is inside a critical section. You don't mention which compiler you are using, so I'm not sure how your critical section is implemented.

--MMS

Please do **not** change the priority of the PendSV exception in the NVIC. This priority is set to 0xFF (the lowest urgency) in QK_init() and must remain the lowest urgency. All this is described in the aforementioned AppNote "Setting ARM Cortex-M Interrupt Priorities...". To debug the problem on the M0, please make sure that the QF critical section actually works by inspecting the PRIMASK register while the code is inside a critical section. You don't mention which compiler you are using, so I'm not sure how your critical section is implemented. --MMS

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Kirk Wolff - 2016-04-14

I didn't say I modified the PendSV, I modified the SYSTICK priority so that it was at a higher priority (lower number) than PendSV. I am using the ARM-MDK from Keil.

Before QF_CRIT_ENTRY_, primask is 0
After QF_CRIT_ENTRY_, primask is 1
After QF_CRIT_EXIT_, primask is back to 0

Last edit: Kirk Wolff 2016-04-14

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Quantum Leaps - 2016-04-14

Good. Do you have enough stack? (No stack oveflow?)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2016-04-15

Yes I've been keeping a close eye on it. There is at least 40% margin by the time the assertion happens. Whatever is happening seems to be time sensitive. The system works fine for a short while but eventually asserts. One data point I've seen is from qspy where the scheduler runs twice in a row triggering the error.

Yes I've been keeping a close eye on it. There is at least 40% margin by the time the assertion happens. Whatever is happening seems to be time sensitive. The system works fine for a short while but eventually asserts. One data point I've seen is from qspy where the scheduler runs twice in a row triggering the error.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Quantum Leaps - 2016-04-15

Wouild it be possible for you to distill the problem to an example that could be run at Quantum Leaps? Something like that would be very helpful. Being able to reproduce a problem like that is typically more than half of the battle...

If you come up with a sample code, please send it to info at state-machine.com along with any instructions how to reproduce the failure (e.g., by manually triggering interrupts or any other method). Example of such a procedure is provided in the AppNotes about the QP ports to the ARM Cortex-M, whrere in the section about QK you can find the preemption testing procedures.

--MMS

Last edit: Quantum Leaps 2016-04-15

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2016-04-15

Hi, I'm a developer that works with Kirk.

It seems that QK_sched_(p) is being called with (p == QK_currPrio_) => true. So an AO is preempting itself. I just wrapped the do{}while() in QK_sched_ with a conditional to make sure it is skipped when (p == pin). The assertion never happens anymore and the application has been running smoothly for around 20 minutes now (before it wouldn't make it past a minute).

I'm guessing there is a guard protecting the scheduler from being called with a priority that should not preempt the current one. Somehow we're getting past that and shouldn't be. I'll keep digging.

-Chris

Hi, I'm a developer that works with Kirk. It seems that QK_sched\_(p) is being called with (p == QK_currPrio\_) => true. So an AO is preempting itself. I just wrapped the do{}while() in QK_sched\_ with a conditional to make sure it is skipped when (p == pin). The assertion never happens anymore and the application has been running smoothly for around 20 minutes now (before it wouldn't make it past a minute). I'm guessing there is a guard protecting the scheduler from being called with a priority that should not preempt the current one. Somehow we're getting past that and shouldn't be. I'll keep digging. -Chris

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Quantum Leaps - 2016-04-15

Thank you Chris. Your findings are very interesting and disturbing at the same time. We will also keep digging...

But this comes at a particularly busy time for us here, so please allow about a week for us to catch up. I sincerely apologize for the delay.

--MMS

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"
Anonymous - 2016-04-19

I think I found it.

In PendSV handler, the PendSV pending bit is cleared by hardware on entry, but could be set again before interrupts are disabled to invoke the scheduler. The result is that PendSV and the scheduler will be called again with the same nextPrio once it enables interrupts.

Wanted to get this to you quick for feedback, so below is my diff. I think this ensures that only one scheduler invocation happens per priority preemption, but please correct me if I'm wrong. We're still wrestling with some other problems here, but I think they are unrelated.

Thanks,
Chris

Index: ports/arm-cm/qk/arm/qk_port.s =================================================================== --- ports/arm-cm/qk/arm/qk_port.s (revision 1241) +++ ports/arm-cm/qk/arm/qk_port.s (working copy) @@ -98,20 +98,30 @@ PendSV_Handler IF {TARGET_ARCH_THUMB} == 3 ; Cortex-M0/M0+/M1 (v6-M, v6S-M)? CPSID i ; disable interrupts (set PRIMASK) ELSE ; M3/M4/M7 MOVS r0,#QF_BASEPRI MSR BASEPRI,r0 ; selectively disable interrupts ENDIF ; M3/M4/M7 ISB ; reset the instruction pipeline + ; The PENDSV bit will be cleared by hardware on entry, but could + ; have been set again before disabling interrupts above. Make sure + ; it is clear inside the critical section so the scheduler doesn't + ; get invoked again at the same priority once it enables interrupts + ; to execute the run to completion step. + LDR r0,=0xE000ED04 ; Interrupt Control and State Register + MOVS r1,#1 + LSLS r1,r1,#27 ; r0 := (1 << 27) (PENDSVCLR bit) + STR r1,[r0] ; ICSR[27] := 1 (clear PendSV) + LDR r0,=QK_nextPrio_ LDR r0,[r0] CMP r0,#0 BNE.N PendSV_sched ; if QK_nextPrio_ != 0, branch to scheduler

I think I found it. In PendSV handler, the PendSV pending bit is cleared by hardware on entry, but could be set again before interrupts are disabled to invoke the scheduler. The result is that PendSV and the scheduler will be called again with the same nextPrio once it enables interrupts. Wanted to get this to you quick for feedback, so below is my diff. I think this ensures that only one scheduler invocation happens per priority preemption, but please correct me if I'm wrong. We're still wrestling with some other problems here, but I think they are unrelated. Thanks, Chris ~~~ Index: ports/arm-cm/qk/arm/qk_port.s =================================================================== --- ports/arm-cm/qk/arm/qk_port.s (revision 1241) +++ ports/arm-cm/qk/arm/qk_port.s (working copy) @@ -98,20 +98,30 @@ PendSV_Handler IF {TARGET_ARCH_THUMB} == 3 ; Cortex-M0/M0+/M1 (v6-M, v6S-M)? CPSID i ; disable interrupts (set PRIMASK) ELSE ; M3/M4/M7 MOVS r0,#QF_BASEPRI MSR BASEPRI,r0 ; selectively disable interrupts ENDIF ; M3/M4/M7 ISB ; reset the instruction pipeline + ; The PENDSV bit will be cleared by hardware on entry, but could + ; have been set again before disabling interrupts above. Make sure + ; it is clear inside the critical section so the scheduler doesn't + ; get invoked again at the same priority once it enables interrupts + ; to execute the run to completion step. + LDR r0,=0xE000ED04 ; Interrupt Control and State Register + MOVS r1,#1 + LSLS r1,r1,#27 ; r0 := (1 << 27) (PENDSVCLR bit) + STR r1,[r0] ; ICSR[27] := 1 (clear PendSV) + LDR r0,=QK_nextPrio_ LDR r0,[r0] CMP r0,#0 BNE.N PendSV_sched ; if QK_nextPrio_ != 0, branch to scheduler ~~~

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Quantum Leaps - 2016-04-20

Yes, I also came to the same conclusion. There is a time window at the beginning of PendSV, before interrupts get disabled. This bug has been introduced when we changed the PendSV implementation to free up the SVCall exception used previously.

Thank you for reporting the bug and for performing this thorough analysis. Apparently, you know Cortex-M very well.

The bug fix is on the way. Stay tuned...

--MMS

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2016-04-20

There may be another bug along the same line. I'm not 100% sure of this one because I haven't tried to step through the scenario completely. It would explain the symptoms of our remaining problem (but I can't rule out that it's not us yet either).

You are using the PendSV when nextPrio==0 as a special case to return to the previously preempted task, whose exception frame sits above the "fake" frame used to "return" to the scheduler, right?

If:
1. PendSV bit is set for this purpose with nextPrio=0 at the end of the sched_ret.
2. The PendSV_Handler is entered
3. An interrupt occurs which updates the nextPrio to something >0 (before interrupts are disabled)

Then I think the request to return to the preempted task is lost and no priorities lower than the current one will ever resume.

We're running to cases now where low priority tasks are getting queue overflow asserts or pool depletion and qspy indicates that some low priority tasks never get resumed, despite nothing else happening on the processor. I think the debugger usually shows the processor stuck in the "branch to self" at the end of the sched_ret with a currPrio >0.

Again, I'm not 100% on this one. I just noticed the symptoms seem to match.

-Chris

There may be another bug along the same line. I'm not 100% sure of this one because I haven't tried to step through the scenario completely. It would explain the symptoms of our remaining problem (but I can't rule out that it's not us yet either). You are using the PendSV when nextPrio==0 as a special case to return to the previously preempted task, whose exception frame sits above the "fake" frame used to "return" to the scheduler, right? If: 1. PendSV bit is set for this purpose with nextPrio=0 at the end of the sched_ret. 2. The PendSV_Handler is entered 3. An interrupt occurs which updates the nextPrio to something >0 (before interrupts are disabled) Then I think the request to return to the preempted task is lost and no priorities lower than the current one will ever resume. We're running to cases now where low priority tasks are getting queue overflow asserts or pool depletion and qspy indicates that some low priority tasks never get resumed, despite nothing else happening on the processor. I think the debugger usually shows the processor stuck in the "branch to self" at the end of the sched_ret with a currPrio >0. Again, I'm not 100% on this one. I just noticed the symptoms seem to match. -Chris

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"
Anonymous - 2016-04-20

I think the problem described above is happening. I fixed it by turning the end of QK_sched_ret into a loop. I think this should work because when an attempt to trigger PendSV with nextPrio==0 gets preempted, the preemption will eventually return back to the loop with nextPrio==0 (from the preempting task's QK_sched_ret).

For reference, here's my diff again.

Index: ports/arm-cm/qk/arm/qk_port.s =================================================================== --- ports/arm-cm/qk/arm/qk_port.s (revision 1241) +++ ports/arm-cm/qk/arm/qk_port.s (working copy) @@ -105,6 +105,16 @@ ENDIF ; M3/M4/M7 ISB ; reset the instruction pipeline + ; The PENDSV bit will be cleared by hardware on entry, but could + ; have been set again before disabling interrupts above. Make sure + ; it is clear inside the critical section so the scheduler doesn't + ; get invoked again at the same priority once it enables interrupts + ; to execute the run to completion step. + LDR r0,=0xE000ED04 ; Interrupt Control and State Register + MOVS r1,#1 + LSLS r1,r1,#27 ; r0 := (1 << 27) (PENDSVCLR bit) + STR r1,[r0] ; ICSR[27] := 1 (clear PendSV) + LDR r0,=QK_nextPrio_ LDR r0,[r0] CMP r0,#0 @@ -166,12 +176,15 @@ MSR BASEPRI,r0 ; enable interrupts (clear BASEPRI) ENDIF ; M3/M4/M7 - ; trigger PendSV to return to preempted task... +PendSV_sched_ret_pend + ; trigger PendSV to return to preempted task. If this gets lost + ; by a preemption that sets QK_nextPrio >0, then it will return + ; back to here with QK_nextPrio==0 from its PendSV_sched_ret. LDR r0,=0xE000ED04 ; Interrupt Control and State Register MOVS r1,#1 LSLS r1,r1,#28 ; r0 := (1 << 28) (PENDSVSET bit) STR r1,[r0] ; ICSR[28] := 1 (pend PendSV) - B . ; wait for preemption by PendSV + B PendSV_sched_ret_pend ALIGN ; make sure the END is properly aligned

Last edit: Anonymous 2016-04-20

I think the problem described above is happening. I fixed it by turning the end of QK_sched_ret into a loop. I think this should work because when an attempt to trigger PendSV with nextPrio==0 gets preempted, the preemption will eventually return back to the loop with nextPrio==0 (from the preempting task's QK_sched_ret). For reference, here's my diff again. ~~~ Index: ports/arm-cm/qk/arm/qk_port.s =================================================================== --- ports/arm-cm/qk/arm/qk_port.s (revision 1241) +++ ports/arm-cm/qk/arm/qk_port.s (working copy) @@ -105,6 +105,16 @@ ENDIF ; M3/M4/M7 ISB ; reset the instruction pipeline + ; The PENDSV bit will be cleared by hardware on entry, but could + ; have been set again before disabling interrupts above. Make sure + ; it is clear inside the critical section so the scheduler doesn't + ; get invoked again at the same priority once it enables interrupts + ; to execute the run to completion step. + LDR r0,=0xE000ED04 ; Interrupt Control and State Register + MOVS r1,#1 + LSLS r1,r1,#27 ; r0 := (1 << 27) (PENDSVCLR bit) + STR r1,[r0] ; ICSR[27] := 1 (clear PendSV) + LDR r0,=QK_nextPrio_ LDR r0,[r0] CMP r0,#0 @@ -166,12 +176,15 @@ MSR BASEPRI,r0 ; enable interrupts (clear BASEPRI) ENDIF ; M3/M4/M7 - ; trigger PendSV to return to preempted task... +PendSV_sched_ret_pend + ; trigger PendSV to return to preempted task. If this gets lost + ; by a preemption that sets QK_nextPrio >0, then it will return + ; back to here with QK_nextPrio==0 from its PendSV_sched_ret. LDR r0,=0xE000ED04 ; Interrupt Control and State Register MOVS r1,#1 LSLS r1,r1,#28 ; r0 := (1 << 28) (PENDSVSET bit) STR r1,[r0] ; ICSR[28] := 1 (pend PendSV) - B . ; wait for preemption by PendSV + B PendSV_sched_ret_pend ALIGN ; make sure the END is properly aligned ~~~

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2019-12-21
  
  I have faced with the same behaviour in version:
  ; Last Updated for Version: 5.6.0
  ; Date of the Last Update: 2015-12-11
  
  It seems i found the explanation in late arriving & tail chaining mechanism of Cortex M.
  
  See Technical referense manual for late arrival & tail chaining.
  
  0) Instruction rises a PendSv interrupt
  STR r1,[r0] ; ICSR[28] := 1 (pend PendSV)
  1) Assume the some Exception "ISR X" happens before the first instruction of PendSV handler. Late arrival happens and ISR X goes to execution and PendSV stay "pended".
  2) Assume ISR X cause to change QK_nextPrio_ from 0 and rises Pend SV which is already in "pending" state.
  
  So, PendSV will be executed only ones and only ones clears the frame from stack. And returns to a loop instead of a preempted thread. Your fix will not help for M4F variant. Because in this case a "R0, LR" pushed to stack at the entry of PendSV. And missed "tail" of PendSv (which properly pop "R0, LR) causes to stack damage.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Quantum Leaps - 2019-12-21
    
    You report that you are using the version 5.6.0, while the comment to this bug filed on 2016-05-01 states that the problem has been fixed in version 5.6.4. Any chance that you might upgrade to that version, or better yet, to the latest QP/C/C++ version?
    
    --MMS
    
    Last edit: Quantum Leaps 2019-12-21
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Quantum Leaps - 2016-04-22

Hi Kirk and Chris,
I wonder if you would be interested in beta-testing the new QK-port that fixes the bug(s) you've found.

If so, please contact Quantum Leaps directly at: info@state-machine.com

--MMS

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Quantum Leaps - 2016-05-01

This bug has been fixed in QP 5.6.4 (in all QP types: QP/C, QP/C++, and QP-nano).

The new Application Note "QP and ARM Cortex-M" explains the updated QK implementation on ARM Cortex-M. The Application Note also explains the QV and QXK kernels.

--MMS

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

QK Q_ASSERT_ID(0,...) from QACTIVE_EQUEUE_WAIT_ in QActive_get_ on CortexM

Real-Time Embedded Frameworks based on active objects & state machines

Group

Searches

Help

#128 QK Q_ASSERT_ID(0,...) from QACTIVE_EQUEUE_WAIT_ in QActive_get_ on CortexM

Discussion