Hi. I'm facing a problem while running my QP application while running on ARM microcontroller (the code works fine while running on Windows). The execution trace shows that while the only global Active Object is being initialized, the initialization hits the QF_crit_entry_() followed by Q_onError() and the whole thing halts.
Hi M,
Your code is causing an assertion failure inside the QP Framework. The most important information is which assertion failed, which is provided in the module and id parameters to the Q_onError() function . So, please just print this information.
On a side note, it seems that you've implemented your own software tracing system based on 'printf'. I'm sure that it was quite a bit of work. But a much more powerful and efficient software tracing system, called QP/Spy, is already provided in the QP Framework. The system is not that hard to use. Please just watch a couple of minutes in the "Getting Started with QP/QM" video.
--MMS
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the reply. However, as this failure occurs even before the debug console has been initialized (ie it occurs at the very start when libc is just initializing the global/static objects), I'm not sure how to print it at that time! Let me see if I can inspect those parameters by attaching a debugger.
BTW, I did not implement my own software tracing system. In fact, I'm using the tracing provided by the ARM emulator that I'm using. And I also regularly use QP/Spy on windows. It's just that I didn't look into using QSpy on the emulator through an emulated serial port.
Regards.
Last edit: M Suleman Khalid 2024-06-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, please use the debugger to find out which assertion is failing. For this, set a breakpoint at Q_onError. There is one assertion in the QTimeEvt::QTimeEvt() constructor (file qf_time.cpp), which might be firing. Most likely you have either incorrect signal for the time event, or an incorrect tick rate. Please check.
--MMS
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
After looking at the code for critical section entry QF_crit_entry_() in the qv_port.cpp, it seems that the the assertion assert(PRIMASK/BASEPRI == 0) is failing.
// NOTE:// The assertion means that this critical section CANNOT nest.
And I'm not sure where in code I'm "nesting" any critical section!
The debug backtrace shows the following sequence of calls:
#0 Q_onError (module=0x6b70 <QF_port_module_> "qv_port", id=110)
at F:/workspaces/myapp/src/bsp/ke02z/qv/bsp.cpp:99
#1 0x000024f2 in QP::QTimeEvt::QTimeEvt (this=0x20000308 <AOs::MyApp::instance+40>,
act=0x200002e0 <AOs::MyApp::instance>, sig=4, tickRate=0) at ../qpcpp/src/qf/qf_time.cpp:85
#2 0x00001026 in AOs::MyApp::myapp (this=0x200002e0 <AOs::MyApp::instance>)
at F:/workspaces/myapp/src/myapp.cpp:56
#3 0x000019ac in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
at F:/workspaces/myapp/src/myapp.cpp:51
#4 0x000019ca in _GLOBAL__sub_I__ZN3AOs6MyApp8instanceE () at F:/workspaces/myapp/src/myapp.cpp:728
#5 0x000046a8 in __libc_init_array ()
#6 0x00000102 in ResetISR () at ../startup/startup_mke02z4.cpp:340
It turns out the sig=4 corresponds the Q_USER_SIG, which is first timeout signal being initialized (in the initializer list) of the active object constructor as m_motorOnTimeoutEvt(this, MOTOR_ON_TO_SIG, 0U),.
The event object is defined as following in the Active Object class:
private:QP::QTimeEvtm_motorOnTimeoutEvt;
Off-course the signal is defined in the application as an enum value of MOTOR_ON_TO_SIG = QP::Q_USER_SIG,.
What could I be doing wrong?
Last edit: M Suleman Khalid 2024-06-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As you found out, you're hitting assertion qv_port:110. The reason for this assertion is explained in the soruce code: "The assertion means that this critical section CANNOT nest." These assertions (qv_port:110/111) have been introduced after the QP/C++ has been scruitinized for strct balancing of critical sections. Since all critical sections now must be balanced, the nesting of critical section indicates an error. I hope you see the point.
So, apparently, your code enters a critical section or just disables interrupts prior to calling the QTimeEvt::QTimeEvt() constructor. The remendy is easy: you need to find the place in your startup sequence where the critical section is entered but not exited. I would just setup your debugger to start from the reset handerl (as opposed to going all the way to main()) and keep stepping through the startup code watching the BASEPRI register. BASEPRI is 0 out of reset, so you should clearly see where it gets set to non-zero value.
Please report to this forum what you find out.
--MMS
Last edit: Quantum Leaps 2024-06-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As suggested by you, I've ran the GDB debug session again while monitoring the value of the BASEPRI register all the way (using info registers basepri GDB command). It apears that the BASEPRI register always stays zero, but the program still hits the assertion failure and Q_onError is thus called. Here is my debug session:
Hi M,
From a quick scan of your GDB session, it seems to me that your CPU is using the PRIMASK register, not BASEPRI. I think this is the case because I see the line (which sets PRIMASK):
275__asmvolatile("cpsid i");
You don't bother mentioning which Cortex-M CPU you're using, but it would be helpful to know that you're running Cortex-M0 or M0+ (ARMv6-M architectures). Your GDB output might be misleading because ARMv6-M does not even have the BASEPRI register.
So, plese repeat the steps, but monitor PRIMASK.
Finally, I would really highly recommend that you do yourself a favor and get a real debugger. It's no longer the 1990's to work with GDB in command-line. With a real debugger, the whole session should take 30 seconds, not a week that it's taking already. I hope my comments make sense to you.
--MMS
Last edit: Quantum Leaps 2024-07-02
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi again! So here is the situation, the PRIMASK is set write at the time the system boots up (cpsid i) till the time the active object constructor enters. That is why, when a timeout object is initialized by the constructor, and the critical section enters, the assertion fails. Not sure, where to go from here!
Secondly about not using the latest QP version. Actually, I started work on the project a little while back with the then latest version ie 7.3.2. Additionally, I'm using QS/QSpy as well as QView to monitor the state of the target on the GUI. At that time, I was facing some problem getting QView to run properly. The problem was that as QView internally calls the Python interpreter, some of my python files were not being loaded/found the python interpreter (being called internally) unless I explicitly included their import statements in the qview.py file, which meant modifying the qview.py file provided with the QP framework. There were some other changes also required to be done in qview.py in order to use my included files. So, I just made a copy of the qview.py made the required changes so that it worked. Upgrading the QP version would mean redoing all the changes I did in previous version of QView, so I thought I can defer it for later.
I know, not an elegant solution, nor what I would've liked to do. However, the hack works for now and for that reason I did not yet upgrade the QP version. If required I can explain the whole situation in a new issue on this forum for your further perusal.
Regards.
Last edit: M Suleman Khalid 2024-07-03
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, so you've found the root cause of the problem. The simplest fix would be to just delete the cpsid i instruction from the Reset_Handler in the startup code. Most likely, you should modify the standard startup code anyway because it typically has endless loops (denial of service) hard-coded in the exception handlers. So, removing the cpsid i instruction shouldn't be such a big deal.
Regarding QView, I see your point. The current QView design is similar to QUTest in that the customization script is launched in the separate instance of the Python interpreter. This is needed in QUTest for test scripts because QUTest executes many test scripts in one run and it needs continuity. However, QView has only one customization and it could be designed to be imported into the single customization. This would invert the control, but it would allow the customizations to import any other Python modules as well (as you were trying to do). I'll look into re-designing QView along these lines.
--MMS
Last edit: Quantum Leaps 2024-07-03
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yeah, I think I'll enable the interrupts before the C library enters the initialization code. However, the interrupt must've been disabled in the startup code for a reason. I'm not sure what can go wrong by doing so.
Regarding QView, it would be great if you review the QView design. I think it can be a very useful and powerful tool to be able to develope a comprehensive GUI using QView to monitor the target state remotely.
Regards.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Absolutely, interrupts must NOT fire during the whole startup sequence and the QP Framework is NOT ready to receive them until QF::run(). But interrupts won't fire unless they are configured and explicitly enabled in the NVIC. You don't need to disable them with PRIMASK. I hope you see my point.
So I think I'm running into the same problem as mentioned here. I'm attempting to upgrade from QP 6.9.2 to 7.3.4 and I think there may have been some changes that are breaking me. Specifically, I was encountering an assertion in qf_time, loc=100 but it was actually causing a hardfault. The hardfault was being caused because I was nesting critical sections. The old QS_onFlush code would disable interrupts before grabbing bytes (as did all the examples).
void QS_onFlush(void)
{
uint16_t b;
// QF_INT_DISABLE(); <-- THIS WAS CAUSING ISSUES UNTIL I COMMENTED IT OUT
while ((b = QS_getByte()) != QS_EOD) { /* while not End-Of-Data... */// QF_INT_ENABLE(); /* while TXE not empty */
while(!(UART_SPY_BASE_PORT->STAT & LPUART_STAT_TDRE_MASK)) {}
UART_SPY_BASE_PORT->DATA = (b & 0xFFU);
// QF_INT_DISABLE();
}
// QF_INT_ENABLE();
}
The assertion would happen correctly afterwards without a hardfault when I noticed that the new examples don't seem to have that crit entry/exit section. I commented mine out and bam, hardfault went away.
In general, this seems like a nice new addition. That said, I guess my question is where can I see the changes that I would need to make to the rest of the code to avoid these issues. I attempted to search the Revision History but nothing immediately jumped out.
The second (and admittedly unrelated) part of my question is why the assertion happens to begin with (it didn't in the old version of QP). I can cause it to happen by sending in "u" via qspy, which in turn causes this line in qf_time.c to die:
since the passed in tickRate is 1 and this assertion fails. This clearly has something to do with multiple tickrates (which I've never used) but it didn't use to cause an assertion. I mainly used "t" and "u" via qspy to see if things were still alive in some debugging situations.
Edit:
I should note, the example I'm looking at is arm-cm/game_efm32 and in bsp.c, the function QK_onIdle seems to protect QS_getByte() with crit entry/exit, but the QS_onFlush function does not do this. This seems inconsistent...
voidQK_onIdle(void){...QS_rxParse();// parse all the received bytesif((l_USART0->STATUS&USART_STATUS_TXBL)!=0){// is TXE empty?uint16_tb;QF_INT_DISABLE();b=QS_getByte();QF_INT_ENABLE();if(b!=QS_EOD){// not End-Of-Data?l_USART0->TXDATA=(b&0xFFU);// put into the DR register}}...}
//............................................................................// NOTE:// No critical section in QS_onFlush() to avoid nesting of critical sections// in case QS_onFlush() is called from Q_onError().voidQS_onFlush(void){for(;;){uint16_tb=QS_getByte();if(b!=QS_EOD){while((l_USART0->STATUS&USART_STATUS_TXBL)==0U){}l_USART0->TXDATA=b;// put into the DR register}else{break;}}}
Though, there is a nice note above the QS_onFlush() function to explain it, don't we need to protect the QS buffer when not called from Q_onError?
Last edit: Harry Rostovtsev 2024-07-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Harry,
All the recent changes to the QP framework are related to the functional safety certification. One of the obvious concerns of every safety standard is the error handling policy, which in QP is centered around the "assertion programming" and "failure assertion programming" (terms from IEC 61508). In case of an assertion failure, the system is already considered unsafe, and so it is NOT allowed to service interrupts, do context switches, etc. The only actions allowed to perform in the error handler (Q_onError() in the newer QP frameworks) are to put the system in a "safe mode" (whatever that means for a given system), and most often do the reset. I hope all this makes sense to you.
Consequently, if you think about it, the assertion check must also happen in a critical section. And so, the assertion macros (Q_ASSERT.., Q_REQUIRE... etc.) now run within critical section.
So, this is one part of your question/problem. I sincerely apologize, if this new error handling policy is not completely backward-compatible, but I don't see a clear way around it.
Now, regarding the new assertions against nesting of critical sections -- this is also related to safety certification. Specifically, the two identified hazards within QP framework are: exiting ciritical section prematurely and leaving the critical section in force too long. Both hazards are mitigated by strictly enforcing non-nesting of critical sections.
Finally, regarding QS software tracing, the QS_onFlush() callback should run only during the initial transient, where interrupts are not configured and not started yet. (This happens later, in the QF_onStartup() callback, please see the documentation for QP Framework startup sequence). In other words, the QS_onFlush() callback should not need any critical section.
I really hope that my explanations make sense to you and that now, that you know why this is done that way, you'll adjust your code to comply. I'm sure this will improve the overall quality.
--MMS
Last edit: Quantum Leaps 2024-07-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yeah, the explanation makes sense. I actually welcome the change, especially if I get safety certs as part of the bargain.
So, this is one part of your question/problem. I sincerely apologize, if this new error handling policy is not completely backward-compatible, but I don't see a clear way around it.
Yeah, totally understandable. I was just hoping for a quick way to scan my code to fix any potential mines left behind from the previous version. The startup sequence diagram helps. Any other resources I should check?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi. I'm facing a problem while running my QP application while running on ARM microcontroller (the code works fine while running on Windows). The execution trace shows that while the only global Active Object is being initialized, the initialization hits the QF_crit_entry_() followed by Q_onError() and the whole thing halts.
Execution trace is attached.
Any idea why this could be happening?
Last edit: M Suleman Khalid 2024-06-26
Hi M,
Your code is causing an assertion failure inside the QP Framework. The most important information is which assertion failed, which is provided in the
module
andid
parameters to the Q_onError() function . So, please just print this information.On a side note, it seems that you've implemented your own software tracing system based on 'printf'. I'm sure that it was quite a bit of work. But a much more powerful and efficient software tracing system, called QP/Spy, is already provided in the QP Framework. The system is not that hard to use. Please just watch a couple of minutes in the "Getting Started with QP/QM" video.
--MMS
Hi, MMS.
Thanks for the reply. However, as this failure occurs even before the debug console has been initialized (ie it occurs at the very start when libc is just initializing the global/static objects), I'm not sure how to print it at that time! Let me see if I can inspect those parameters by attaching a debugger.
BTW, I did not implement my own software tracing system. In fact, I'm using the tracing provided by the ARM emulator that I'm using. And I also regularly use QP/Spy on windows. It's just that I didn't look into using QSpy on the emulator through an emulated serial port.
Regards.
Last edit: M Suleman Khalid 2024-06-27
Yes, please use the debugger to find out which assertion is failing. For this, set a breakpoint at Q_onError. There is one assertion in the QTimeEvt::QTimeEvt() constructor (file
qf_time.cpp
), which might be firing. Most likely you have either incorrect signal for the time event, or an incorrect tick rate. Please check.--MMS
Hi,
I attached the debugger and set it up to break at the entry of
Q_onError()
. This is what I got!After looking at the code for critical section entry
QF_crit_entry_()
in the qv_port.cpp, it seems that the the assertion assert(PRIMASK/BASEPRI == 0) is failing.The comment on top of the function says that:
And I'm not sure where in code I'm "nesting" any critical section!
The debug backtrace shows the following sequence of calls:
It turns out the sig=4 corresponds the Q_USER_SIG, which is first timeout signal being initialized (in the initializer list) of the active object constructor as
m_motorOnTimeoutEvt(this, MOTOR_ON_TO_SIG, 0U),
.The event object is defined as following in the Active Object class:
Off-course the signal is defined in the application as an enum value of
MOTOR_ON_TO_SIG = QP::Q_USER_SIG,
.What could I be doing wrong?
Last edit: M Suleman Khalid 2024-06-28
As you found out, you're hitting assertion
qv_port:110
. The reason for this assertion is explained in the soruce code: "The assertion means that this critical section CANNOT nest." These assertions (qv_port:110/111
) have been introduced after the QP/C++ has been scruitinized for strct balancing of critical sections. Since all critical sections now must be balanced, the nesting of critical section indicates an error. I hope you see the point.So, apparently, your code enters a critical section or just disables interrupts prior to calling the
QTimeEvt::QTimeEvt()
constructor. The remendy is easy: you need to find the place in your startup sequence where the critical section is entered but not exited. I would just setup your debugger to start from the reset handerl (as opposed to going all the way to main()) and keep stepping through the startup code watching the BASEPRI register. BASEPRI is 0 out of reset, so you should clearly see where it gets set to non-zero value.Please report to this forum what you find out.
--MMS
Last edit: Quantum Leaps 2024-06-28
Hi MMS,
As suggested by you, I've ran the GDB debug session again while monitoring the value of the BASEPRI register all the way (using
info registers basepri
GDB command). It apears that the BASEPRI register always stays zero, but the program still hits the assertion failure and Q_onError is thus called. Here is my debug session:Any suggestions?
Regards.
Hi M,
From a quick scan of your GDB session, it seems to me that your CPU is using the PRIMASK register, not BASEPRI. I think this is the case because I see the line (which sets PRIMASK):
You don't bother mentioning which Cortex-M CPU you're using, but it would be helpful to know that you're running Cortex-M0 or M0+ (ARMv6-M architectures). Your GDB output might be misleading because ARMv6-M does not even have the BASEPRI register.
So, plese repeat the steps, but monitor PRIMASK.
Finally, I would really highly recommend that you do yourself a favor and get a real debugger. It's no longer the 1990's to work with GDB in command-line. With a real debugger, the whole session should take 30 seconds, not a week that it's taking already. I hope my comments make sense to you.
--MMS
Last edit: Quantum Leaps 2024-07-02
Yes I'm using Cortex M0+, sorry forgot to mention it earlier. Also I'm using QP version 7.3.2, if it matters.
And yes, I do use a graphical debugger, however it's easier to share text-based session of GDB.
Thanks for the support. Will get back to you after inspecting PRIMASK.
Last edit: M Suleman Khalid 2024-07-02
Any particular reason why you are not using the latest QP (currently 7.3.4)?
--MMS
Hi again! So here is the situation, the PRIMASK is set write at the time the system boots up (
cpsid i
) till the time the active object constructor enters. That is why, when a timeout object is initialized by the constructor, and the critical section enters, the assertion fails. Not sure, where to go from here!Secondly about not using the latest QP version. Actually, I started work on the project a little while back with the then latest version ie 7.3.2. Additionally, I'm using QS/QSpy as well as QView to monitor the state of the target on the GUI. At that time, I was facing some problem getting QView to run properly. The problem was that as QView internally calls the Python interpreter, some of my python files were not being loaded/found the python interpreter (being called internally) unless I explicitly included their
import
statements in the qview.py file, which meant modifying the qview.py file provided with the QP framework. There were some other changes also required to be done in qview.py in order to use my included files. So, I just made a copy of the qview.py made the required changes so that it worked. Upgrading the QP version would mean redoing all the changes I did in previous version of QView, so I thought I can defer it for later.I know, not an elegant solution, nor what I would've liked to do. However, the hack works for now and for that reason I did not yet upgrade the QP version. If required I can explain the whole situation in a new issue on this forum for your further perusal.
Regards.
Last edit: M Suleman Khalid 2024-07-03
OK, so you've found the root cause of the problem. The simplest fix would be to just delete the
cpsid i
instruction from theReset_Handler
in the startup code. Most likely, you should modify the standard startup code anyway because it typically has endless loops (denial of service) hard-coded in the exception handlers. So, removing thecpsid i
instruction shouldn't be such a big deal.Regarding QView, I see your point. The current QView design is similar to QUTest in that the customization script is launched in the separate instance of the Python interpreter. This is needed in QUTest for test scripts because QUTest executes many test scripts in one run and it needs continuity. However, QView has only one customization and it could be designed to be imported into the single customization. This would invert the control, but it would allow the customizations to import any other Python modules as well (as you were trying to do). I'll look into re-designing QView along these lines.
--MMS
Last edit: Quantum Leaps 2024-07-03
Yeah, I think I'll enable the interrupts before the C library enters the initialization code. However, the interrupt must've been disabled in the startup code for a reason. I'm not sure what can go wrong by doing so.
Regarding QView, it would be great if you review the QView design. I think it can be a very useful and powerful tool to be able to develope a comprehensive GUI using QView to monitor the target state remotely.
Regards.
Absolutely, interrupts must NOT fire during the whole startup sequence and the QP Framework is NOT ready to receive them until
QF::run()
. But interrupts won't fire unless they are configured and explicitly enabled in the NVIC. You don't need to disable them with PRIMASK. I hope you see my point.Regarding the re-design of QView, it's done alredy. Please take a look at the latest release:
https://www.state-machine.com/qtools/history.html#qtools_7_4_1.
I'll make a separate post about the new QView in this forum.
--MMS
So I think I'm running into the same problem as mentioned here. I'm attempting to upgrade from QP 6.9.2 to 7.3.4 and I think there may have been some changes that are breaking me. Specifically, I was encountering an assertion in qf_time, loc=100 but it was actually causing a hardfault. The hardfault was being caused because I was nesting critical sections. The old QS_onFlush code would disable interrupts before grabbing bytes (as did all the examples).
The assertion would happen correctly afterwards without a hardfault when I noticed that the new examples don't seem to have that crit entry/exit section. I commented mine out and bam, hardfault went away.
In general, this seems like a nice new addition. That said, I guess my question is where can I see the changes that I would need to make to the rest of the code to avoid these issues. I attempted to search the Revision History but nothing immediately jumped out.
The second (and admittedly unrelated) part of my question is why the assertion happens to begin with (it didn't in the old version of QP). I can cause it to happen by sending in "u" via qspy, which in turn causes this line in qf_time.c to die:
since the passed in tickRate is 1 and this assertion fails. This clearly has something to do with multiple tickrates (which I've never used) but it didn't use to cause an assertion. I mainly used "t" and "u" via qspy to see if things were still alive in some debugging situations.
Edit:
I should note, the example I'm looking at is arm-cm/game_efm32 and in bsp.c, the function QK_onIdle seems to protect QS_getByte() with crit entry/exit, but the QS_onFlush function does not do this. This seems inconsistent...
Though, there is a nice note above the QS_onFlush() function to explain it, don't we need to protect the QS buffer when not called from Q_onError?
Last edit: Harry Rostovtsev 2024-07-11
Hi Harry,
All the recent changes to the QP framework are related to the functional safety certification. One of the obvious concerns of every safety standard is the error handling policy, which in QP is centered around the "assertion programming" and "failure assertion programming" (terms from IEC 61508). In case of an assertion failure, the system is already considered unsafe, and so it is NOT allowed to service interrupts, do context switches, etc. The only actions allowed to perform in the error handler (
Q_onError()
in the newer QP frameworks) are to put the system in a "safe mode" (whatever that means for a given system), and most often do the reset. I hope all this makes sense to you.Consequently, if you think about it, the assertion check must also happen in a critical section. And so, the assertion macros (Q_ASSERT.., Q_REQUIRE... etc.) now run within critical section.
So, this is one part of your question/problem. I sincerely apologize, if this new error handling policy is not completely backward-compatible, but I don't see a clear way around it.
Now, regarding the new assertions against nesting of critical sections -- this is also related to safety certification. Specifically, the two identified hazards within QP framework are: exiting ciritical section prematurely and leaving the critical section in force too long. Both hazards are mitigated by strictly enforcing non-nesting of critical sections.
Finally, regarding QS software tracing, the
QS_onFlush()
callback should run only during the initial transient, where interrupts are not configured and not started yet. (This happens later, in the QF_onStartup() callback, please see the documentation for QP Framework startup sequence). In other words, theQS_onFlush()
callback should not need any critical section.I really hope that my explanations make sense to you and that now, that you know why this is done that way, you'll adjust your code to comply. I'm sure this will improve the overall quality.
--MMS
Last edit: Quantum Leaps 2024-07-11
Yeah, the explanation makes sense. I actually welcome the change, especially if I get safety certs as part of the bargain.
Yeah, totally understandable. I was just hoping for a quick way to scan my code to fix any potential mines left behind from the previous version. The startup sequence diagram helps. Any other resources I should check?