PUBLISH atomic ?

2016-02-25
2016-03-03
  • Marko Panger

    Marko Panger - 2016-02-25

    Hi !

    I'm using QP with a custom RTOS and I'm not sure about the PUBLISH call.

    According to the notes in qf_ps.cpp it should be an atomic call in that preemption is locked when cycling the for() loop and posting to each subscriber.

    However, runnig a custom RTOS, there are no hooks that locks/unlocks the kernel inside the PUBLISH function. Thais beeing said each POST in the for() loop wakes up the posted AO till all subscribed AOs have been posted. And this can lead to some race condition if some ordering of signals is expected.

    I'm asking for some directions here, but first of all would like to check if PUBLSIH is really meant to be atomic in regards to the individual POSTs.

    Thanks,
    Marko

     
  • Quantum Leaps

    Quantum Leaps - 2016-02-25

    The QF::publish_() implementation is not atomic and the documentation is out of date. This is really a documentation bug and should be filed in the Bug Tracker, so that it can be tracked.

    The [online documentation for QF::publish_()]http://www.state-machine.com/qpcpp/class_q_p_1_1_q_f.html#af5c3a36b34ed2255f868132f7b05435e) has been already corrected. Here is the updated note:

    note
    To avoid non-determinism, the event multicasting performed in this function is not atomic, meaning that while the event is being posted to the subscribers the interrupts remain enabled and the scheduler remains unlocked. Instead, to avoid priority inversions, the multicasting always starts from the highest-priority subscriber and proceeds in the descending order of subscriber priority. In most cases, this ordering is equivalent to truly atomic multicasting.
    However, under a preemptive kernel, the fact that multicasting is not really atomic might have unexpected consequences in ordering of events. For example, if the original multicast event triggers further posting or publishing of event(s) in the high-priority subscriber(s), those secondary events might be queued before the original event in the low-priority subscriber(s). While the ordering of events remains fully predictable based on the relative priorities of the active objects, such re-ordering might appear not intuitive or unexpected.

    I hope that this clarifies the issue.

    The fix of this documentation bug will be included in the next QP/C and QP/C++ releases.

    --MMS

     
    Last edit: Quantum Leaps 2016-02-25
  • Marko Panger

    Marko Panger - 2016-02-25

    Thanks for the reply.

    Regardless of teh priority, that should not be used as a synchrnonization primitive, but as a scheduling policy some non determinsm might occur.

    Let's say AO1 publish a message that says "start your bussiness" and that several other AOs sends their results to AO2 which stores them to non-volatile memory, but storage is allowed only after "start your bussiness" is received. In this schema AO2 might receive soemthing from other AOs due the PUBLISH not beeing atomic.

    This is a problem for our logic that requires some chronological order of published events. Would you consider adding hooks for disabling preemption in a future release ?

    Thanks, Marko

     
    • Panopticon

      Panopticon - 2016-02-25

      Marko --

      I've found that usually in a situation like this, the easiest thing to do is to implement explicit synchronization at the application level, rather than relying on the underlying scheduler. This allows you to move between QK's preemptive kernel and the non-preemptive kernel (or other kernels, with other scheduling policies).

      So for example, in this case, I would have AO1 tell AO2, "Get ready to start storing results", then AO1 waits for an ACKnowledgement event back from AO2 before AO1 publishes "start your business" to the whole world.

      Does that make sense?

       
    • Quantum Leaps

      Quantum Leaps - 2016-02-25

      Marko,

      QF::publish_() has obviously conflicting requirements.

      On one hand the system must be deterministic, meaning that every QP API call should disable interrupts or lock the scheduler only for a limited and a-pirori known time. Violating this requirement would render the system problematic for hard real-time applications.

      On the other hand, the system must be intuitive and must not behave in "unexpected" ways, such as change order of events in event queues. However, this requirement is much fuzzier, and in fact, it is even difficult to define "unexptected behavior".

      So, at this point, the requirement for determinism seems to prevail. Especially, that the second requirement can always be met at the application level (as prposed by Panopticon below, for example).

      I hope this makes sense to you.

      --MMS

       
  • Marko Panger

    Marko Panger - 2016-02-25

    It makes sense as it solves the problem, but it introduces another complexity which is inter-AO handshaking in sense signals are logicaly not uni-directional anymore. Beside this testing becomes problematic if you want to disable some AO for a while as you won't get ACKs back.

    OK, I really wanted to get a confirmation about PUBLISH not being atomic. Will think how to solve this on an application level.

    Thanks, Marko

     
  • Marko Panger

    Marko Panger - 2016-02-26

    Miro (MMS, I assume),

    The more I think the more problematic I see it as it brakes the chronological orders of events that end up in an AO queue. Some bussiness logic requires such order. True, it can be solved on an app level, but the application becomes clumsy and is hard to predict all of the effects down the road. In particular when the system grwos up.

    Your deterministic argument is higly debatable. Disabling preemption doesn't mean disabling IRQs on a global level. IRQs are still running and buffering data. When PUBLISH releases the scheduler the AOs can pipe this data out and process it. I assume a PUBLISH is not expensive anyway as it just stores pointers to subscribed queues.

    Would it be possible to modify (and include this in a future formal release) hooks for disabling/enabling preemption. I envision PUBLISH calls a macro (implemented in each indivisual OS port) that disables preemption before entering the for() loop that POSTs teh subscribed queues and when is it doen doing so it call a macro to release the scheduler.

    Now, it is up to the port if it wants to disable preemption or not.

    Marko

     
  • Quantum Leaps

    Quantum Leaps - 2016-02-26

    Marko,

    I believe that the suggested scheduler locking during the event posting loop in QF::publish_() was already implemented in the past, but has been taken out. So, we are clearly back-and-forth on this issue. (Hence the leftover documentation, which was not updated after the change.)

    It's hard to win here, because if scheduler locking were added to QF::publish_(), someobdy would post a (legitimate) request to remove it.

    Of course, scheduler locking does not affect interrupts, but the task-level response would become dependent on the number of subscribers to various events. This non-deterministic task-level response might adversely affect hard real-time code in the task lavel, which could miss deadlines very intermittently (depedning on the number of subscribers). This would mean that such hard real-time code would necessarily have to move to the ISR level, which would defeat the whole purpose of using a preemptive kernel in the first place. (The ISRs would become bloated and entangled with the task level). I hope you can see why it would be a bad idea...

    As a compromise, I can see one solution, which would be selective scheduler locking--very much like a priory-ceiling mutex. Event publishing is actually ideal for this, because the priority ceiling is self-evident here. The scheduler needs only be locked up to the highest-priority subscriber to a given event. The active object tasks above such "priority ceiling" don't need to be affected. So this would be a way to have a cake and eat it too.

    The suggested feature would be actually quite workable for the bulit-in kernels. The QV kernel obviously is not preeptive, so it doesn't need any changes. The preemptive QK and QXK kernels already support priority-ceiling mutexes, so it should be quite straightforward to add such a feature there.

    Rather, the problem would be with 3rd-party RTOS kernels (as I recall you've mentioned that you are using one of those), because one can't assume a universal support for priority-celing.

    I will think how to bring such a "selective scheduler locking" feature to QF::publish_() in a portable way. In the meantime, perhaps you could post if and how your RTOS woudl support such a feature. This would be an interesting data point for me.

    As usual, I hope that my comments make sense to you.

    --MMS

     
  • Marko Panger

    Marko Panger - 2016-03-02

    Hi,

    Sorry for my late reply. I'm seeing just now it didn't accepted my last post.

    So, the OS we are using doesn't support preemption ceiling, but it can be easyly added. However, manageing of the cailing might become problematin in bigger designs.

    If I'm permitted I would suggest to add hooks in the PUBLISH method around the for() loop that is posting to subscribed AOs. Then is up to the port how it want to disable preemption and to what extent. I think this is quite flexible and is backward compatible.

    Thanks, Marko

     
    • Quantum Leaps

      Quantum Leaps - 2016-03-02

      The suggestion to re-introduce scheduler locking "hooks" to QF::publish_() sounds reasonable.

      There is also another alternative, albeit a bit more complex. The completely portable solution would be for QF::publish_() to directly post the event to the higest-priority subscriber first. Then this highest-priority subscriber would multicast the event to all the other subscribers, without worrying about locking the scheduler. Scheduling locking would be unnecessary, because muliticasting to lower-priority AOs would never lead to preemption. The other advantage of this solution would be avoiding potentially large overhead of multicasting in the caller's context (e.g., when ISR calls QF::publish_()). Also, this solution would be portable to any 3rd-party RTOS. But the downside would be complexity and additional overhead in every event processing (to check whether the event needs to be multicast).

      Still, this solution suggests a workaround to your problem that you can quite easily implement manually. To avoid any potential changes in ordering of events, you can post directly to the highest-priority reciptient (which shouild NOT subscribe to the event). This highest-pririty AO can than publish the event from its own state machine to the rest of the application. I hope you get the idea.

      --MMS

       
      Last edit: Quantum Leaps 2016-03-02
  • Marko Panger

    Marko Panger - 2016-03-03

    Interesting idea, but it doesn't cover all of teh possible scenarios. Let's spot one:

    Assume your highest priority AO receives a published signal and needs to multicast it further. Before it manages to publish it further (IRQs are not disabled and preemption is not locked) a timer expires that wakes up an even higher priority AO that was not subscribed to the original signal in question. That AO publishes a signal. At this point this signal will end up the lower priority queues before the orignal signal in question leading to reversed signals in question.

    Solving this issue on the app level introduces complexity at the app level. As oposed to itroducing complexity in teh framework which is manageable we introduce complexity in the app which is way more prone to errors. Not to mention the dependecies we introduce. PUBLISH is realy meant send and don't bother who is subscribed.

    Marko

     
    Last edit: Marko Panger 2016-03-03
  • Quantum Leaps

    Quantum Leaps - 2016-03-03

    Yes, absolutely, we don't need to debate that the main goal of the framework is to hide the complexity from the application and do the "heavy lifting" behind the scenes. So, the future releases of QP/C/C++ will try to offer some solution for the issues you raise in this discussion.

    But, just to be clear, your example is a classic concurrency issue, which will NOT be solved by scheduler locking, or any other technique that I know of. Events produced in an interrupt can always "sneak into" a queue, be it during "atomic" event multicasting or any other situation where interrupts are not explicitly disabled. Therefore, I'm not aware of any mechanism (be it in QP or any conventional RTOS) that would prevent the re-ordering of events in such a case.

    I hope that my comments make sense to you.

    --MMS

     
    Last edit: Quantum Leaps 2016-03-03

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks