Menu

Task priorities missunderstanding

Help
2010-07-16
2012-12-09
  • Dragan Grujičić

    Hello all,

    Again I have a problem with my application. I can make it work, but again not the way I want. This time it is priorities. May be I do not understand their organisation, or I am doing something stupid. Again.

    On http://pastebin.com/31gP2Ts8 is the code for 4 tasks.
    On http://pastebin.com/wQua700R is config file.

    Code in the tasks is executing as intended, but is not scheduled as I planned.

    Let me explain:
         appLoop_UART_RxTask(void) should be high priority task for getting characters from the input register to input queue for further processing. This task waits for event thrown by ISR and should be first task executed after ISR as it is only task on priority 3 (highest priority).
        appLoop_Debug(void) is normal priority (2) task application code. It is executed periodically with fixed timing (in this example 10ms). When it finishes execution it blocks on wait statement.
         appLoop_UART(void) is another normal priority  (2) task competing for execution with Debug task. It blocks on waiting for another character in input buffer.
        appLoop_Display(void) is low priority communication task that is pooling communication channel and should be executed on lowest priority level (1). Idea is to let this task running without blocking in the free CPU time when no other tasks are running. There is large hardware buffer for communication that permits this (in my opinion, of course).

    So this is a setup: four tasks, one high priority, one low and two normal priority. Three are blocking and lowest priority task is not blocking.

    And there is a problem the way I see it: Kernel should be preemptive with round-robin scheduling on every priority level. Tasks are executed according to priority high priority first. On one level of priority tasks are scheduled in round- robin manner: low task number first and then round (you know what I mean). Tasks can give CPU away in two ways: when the timer ticks or before tick if they willingly release CPU (by waiting for event or lock or by saying I am finished wake me up when my time comes again). If there is no task to run idle task or OS is running in the background. But, if there is task that is not blocking on low priority level it is always ready and will execute all the free time. It will wait  for all other tasks to finish and then execute all the time.

    This is how I see my setup: UART_RxTask is on priority 3 and should be executed ASAP after UART Rx ISP is triggered. Two tasks on priority level 2 are executing concurrently the main application. They ether block on a lock or wait for their time. All other free time task Display on priority level 1 should be pooling communication interface without blocking.

    Tasks UART_RxTask and UART are coupled together by input buffer: first writes to it and second reads. So if there is nothing to read UART task will block indefinitely.

    This is how I viewed the system. But the problem is following:

    If Display task does not block (there is commented line on //taskDelayFromNow(10);) UART task is executed exactly 2 times: once it writes messages, waits for character, receives character, prints messages second time and stops forever. Display task is running as normal (looping back characters to telnet PC application). In my opinion this is not correct.

    If  there is no comment on line 8. (  //taskDelayFromNow(10);  ) in Display task system functions as "normal", but this is not something I want to achieve.

    I tried setting #define  cfgUseHierarchicalRoundRobin            cfgTrue in order to avoid this situation but with no success.

    My question: Is this how system is supposed to work? Or am I doing something stupid again? Can this be configured to work as "expected"?

    Please answer if you have any solution.

    Best regards
    Dragan

    PS: I moved blog on project Balansero to new location. Check out if it is wort an effort.

     
  • De Vlaam

    De Vlaam - 2010-07-16

    Your perception of how it is supposed to work is flawless. As far as i can see now, your problem should not appear.

    I tried setting #define cfgUseHierarchicalRoundRobin cfgTrue
    

    That this provides no solution is expected, since with this option you make sure the OS remembers the position of the round robin on each priority level before switching to the next. Normally, it starts again at the beginning when a higher priority has been serviced. However, this may be undesirable sometimes as it may starve some tasks. For example, if you have two tasks in priority 2 (2a, 2b) and one in 3 you might get an execution sequence like 2a, 3, 2a, 3, 2a etc. With cfgUseHierarchicalRoundRobin set you would get 2a, 3, 2b, 3, 2a, 3, 2b etc. However this does not seem to be your problem. I would switch that option off for the moment.

    So what is your problem? I saw two points of attention.

    In the config file it states:

    #define  CN_00                                   Display
    #define  CN_01                                   UART
    #define  CN_02                                   Debug
    #define  CN_15                                   UART_RxTask
    

    Now i dont think a leap in numbers is allowed here. The preprocessor will not recognize correctly CN_15 i think, but i never tried it. I would change that to

    #define  CN_03                                   UART_RxTask
    

    The numbering itself (which task gets which number) is of no importance, but (i think) it must be consecutive.

    An other point you might be looking at is the following. A lower priority task may hold a higher priority task hostage when a semaphore (lock) is taken and the preemptive context switch takes place to run a higher priority task. If the higher priority task is a spin lock on the same semaphore the whole system locks up and if it is a regular block, at least two tasks will stop running. Such constructions should be avoided. I cannot really judge if that is the case here since one routine is missing from the code loopback_tcp_server(). You say it does not block, but where is the other use of the UARTRx slot?

    Sometime the situation described above can be solved by setting cfgUsePriorityLifting to cfgTrue, see
    http://www.femtoos.org/code_config.html#1406
    however, this option is only a last resort, better is tho make sure two such tasks run in the same priority from the beginning. And, i did not do a lot of testing on this option, so you may have some (unpleasant) surprises there too.

    The difference of adding or letting out the taskDelayFromNow(10) from the appLoop_Display() method is that, when it is included, that task will almost always (!!) end at that spot, so not in the middle. However, without the instruction, it will preempt from the  loopback_tcp_server(). So i need that code to see what is going on there. Can you post that? Note that, eventually there will be a moment that it preempts from any spot of your code, so if the code does not work without the taskDelayFromNow(10) call it will eventually stop with it too. I may take weeks to surface, but it will break. Thats why writing concurrent code for embedded systems is so difficult.

    If this all does not help, you have the option to activate tracing, and trace what your program is doing. In the distribution there is a trace program, but you need a parallel port on your PC. See the (very brief) instructions on the website.
    http://www.femtoos.org/examples.html
    For some reason tracing works only the second time (this seems to be a bug in the parallel port driver). So tracing works, do not give up to easily. If you have any questions on the use, please post them

     
  • De Vlaam

    De Vlaam - 2010-07-16

    Now i don't think a leap in numbers is allowed here.

    After looking at the code, i think this should be allowed after all, i never tested it however. I put this on my to do list. For the moment, better save than sorry.

     
  • Dragan Grujičić

    Hello again,

    Thank you for fast reply Ruud.

    Here is a code for Display task.

    Changing priority for display task did not help. For me it always worked fine until this problem.
    There are no locks in Display, only non OS problem.

    I checked something else. I changed appLoop_UART(void) like this:

    #if (preTaskDefined(UART))
    void appLoop_UART(void)
    {
        Tchar number[10];
        Tuint08 cnt;
      while (true)
      {
          UART_send_P(Msg1);
          UART_send(RAMdata);
          printf_P(Msg2);
          itoa(177, number, 10);
          printf("%s.",number);
          itoa(14, number, 10);
          printf("%s.",number);
          itoa(0, number, 10);
          printf("%s.",number);
          itoa(cnt++, number, 10);
          printf("%s",number);
          UART_putc('\n');
          taskDelayFromNow(1);
          //UART_putc(UART_getc());
      }
    }
    #endif
    

    Now it only waits for 1 clock tick. And I tried different priorities. Also tried commenting taskDelayFromNow(1);.

    Bottom line: All works as expected. If there is no taskDelayFromNow, If Display and UART are same priority they work concurrently. No problem. If Display is priority 1 it blocks forever.

    If there is taskDelayFromNow no meter priority they both work fine.

    The problem seems to be UART_getc() or UART_RxTask or both. Only problem is that if there isnt any nonblocking tasks in low priority this works fine. Other thing is that if I coment //UART_putc(UART_getc()); i do not  read and block on queue.

    I will investigate this more.

     
  • De Vlaam

    De Vlaam - 2010-07-16

    I do not fully understand your post, for example

    Changing priority for display task did not help

    and

    Bottom line: All works as expected. If there is no taskDelayFromNow, If Display and UART are same priority they work concurrently. No problem. If Display is priority 1 it blocks forever. 

    seem contradictory. And i thought the problems started when you leave out taskDelayFromNow. So this is confusing me.

    Further, i miss essential parts of the code, for a full evaluation. For example the UART_getc() call, where is the source?
    (it is not in my avr_libc).

    I still seems to me to tasks are holding each other in some way. Really, tracing would help you out here i guess, since you see exactly what is going on, and which are the lasts call performed.

     
  • Dragan Grujičić

    Hello Ruud,

    This is officially crazy. For me.

    In this file are two projects:
    Balansero001 and Balansero (this new version of code stripped down from everything but UART driver). I simplified tasks code as much as I could. Both projects are for ATmega328p on Arduino. Both compile with no error and no warnings.

    Balansero001: if //taskDelayFromNow(1); is not commented in task Debug all is OK. LEDS flash, characters are looped, and Debug is just running. But if I comment code: LEDS flash untill I send first character (from keyboard) and then all stops: LEDS die and no loopback of chars. You will se that Debug is on the lowest level of priority. Same thing happens when I raise priority on 2.

    Balansero: in present state of code if i press a key I get an error:

    ERROR: B
    Function: 27
    Task: 2
    I do not know why? You will see similar task configuration as in Balansero001

    I can not find a bug. I worked all night and do not know where to look.

    Best regards
    Dragan

     
  • Dragan Grujičić

    I forgot on thing: in Balansero in USART_RXC_VECT there is UDR_PORT = 'a'; It was put there to see if ISR is triggered. After error port stops flashing I get never ending stream of "aaaaaaaaaaaaaaaaaaaa…." on my terminal. This is actually OK because if char is not read interrupt routine will be retrigered.

     
  • Dragan Grujičić

    Hello
    Fixed problem with error in project Balansero, problem with nonblocking task remains. New file on project Balansero.
    There are 3 tasks. UART_Rx on priority 3, for UART RX interrupt.
    Other two on priority 2. One is RX loopback, other is just for test. If this other is not blocking loopback of characters is blocked.

     
  • Dragan Grujičić

    Hello,

    I tried everything to find what is making the trouble and I finally found it. But this is putting me back to the beginning.

    In this file there is a UART driver for FemtoOS.
    The problem was isrEndYield() in RX interrupt. As long as all tasks are blocking (on wait, or event, queue…) all is OK. As soon as some task is not blocking, regardless in priority things go wrong. Changing this for isrEndReturn() made all the difference. But now performance is lover, at least I see it that way. As I will be using keyboard RX I am satisfied, but still…

    As you can recall in this post we discussed different strategies for RX ISR. You suggested using event and high priority task and I excepted this. You proposed this kind of code:

    Of course, it also possible to use the interrupt (with isr) solely to wake a high priority task, which fully handles the incoming bytes. This is polling, but guarantees maximum throughput. In that case, the ISR looks (note the absent isrEnter/isrExit!)

    void USART_RXC_vect(void)
    { IsrFireEventOnName(IncomingByte); }

    And handle everything else in the task.

    This did not work because as I press first key system freezes.

    So i picked following pair: isrBegin()/isrEndYield() because:

    isrEndYield(): use this if you want a context switch directly after the handling of your interrupt. You are still responsible for restoring any registers, so it only makes sense to use isrBegin() / isrYield() over isrEnter() / isrExit() if your response to the interrupt need to be very fast and does not require that many operations. Since this method calls taskYield() that method must be included too.

    This seemed logical solution and worked just fine until the problem with nonblocking tasks.

    I have seen this kind of approach in this post in the first C file on the paste bin. I wanted to ask how did phobos2 managed to do so but soon after he changed his code to something like mine, and I thought he had similar problems, so I dropped it.

    Question: Is it really possible to make this code without isrBegin() / isrYield() pair and how? What is difference between isrEndReturn() and isrEndYield() to make such difference in processing and difference in performance ( which is better for isrEndYield() in my opinion). This will be important for me because I need to handle even faster interrupts (from incremental encoder, bat on this in some other time).

     
  • Dragan Grujičić

    Hello,

    Another thing:

    As you can see none of the UATR sending functions are thread safe (they use common resource - UART rx and tx port) without semaphore or any other synchronization method. This is done on purpose. Adding semaphore to a function for logging or error reporting (something like perror) would be easy. But general reading and writing functions are, for sake of performance, left without this.

    And then I have a problem:

    cfgUseSynchronization: Activate if you make use of the synchronization primitives.
    #define cfgUseSynchronization: (cfgSyncNon | cfgSyncSingleSlot | cfgSyncSingleBlock | cfgSyncDoubleBlock)
    Specify here how you want the synchronization to be used. All synchronization Information is kept in a slotstack, one for each task. The size of that stack can be defined further below. The way this is stack is 'formatted', is chosen here. The more complex the formatting, the more possibilities and the more code generated by the preprocessor. You can choose from cfgSyncSingleSlot, cfgSyncSingleBlock, cfgSyncDoubleBlock and cfgSyncNon.
         If you do not want to make any use of synchronization set this to cfgSyncNon. Please note that, although no code is included to handle slots, the slot stacks may still claim ram. Set them explicitly to zero if you do not need them. If you choose for cfgSyncSingleSlot, one slot may be occupied per task at every moment and no nesting is allowed. This takes one byte for every task using slots. It makes no sense to define the SlotSizes larger than one (and it is forbidden, to make sure we can optimize the code).
         If you choose for cfgSyncSingleBlock, every task may hold one blocking slot at a time, and may use the rest of the bytes in the slot stack for other free locks or nesting. Every byte can hold two slots (one per nibble). If you choose for cfgSyncDoubleBlock, every task may hold two blocking slots, on which it blocks simultaneously. The other bytes in the slot stack are used for free locks.

    In most simple application (one task is reading and writing UART, or task has some other synchronization and wants to write UART, where writing is semaphore protected) we will need two synchronization elements but newer at same time (this means at same clock tick, not simultaneously). Which option to choose?

    Now:

    -cfgSyncNon this I understand

    -cfgSyncSingleSlot I use this now, but what is difference between this and cfgSyncSingleBlock. In my opinion one of these is the option I need, because I do not block at the same time, although in this post you suggested to use cfgSyncDoubleBlock.

    -cfgSyncDoubleBlock

    If you choose for cfgSyncDoubleBlock, every task may hold two blocking slots, on which it blocks simultaneously

    But this example does not use simultaneous blocking. Or is it?

    Any help on choosing right option would be helpful. Also choosing right slot size for tasks.

    Best regards.
    Dragan

     
  • De Vlaam

    De Vlaam - 2010-07-20

    Hi Dragan,

    Thanks for you active work on my OS. There are some many questions
    and issues all at once, that i cannot answer them all. Let me just start
    with the direct questions.

    Question: Is it really possible to make this code without isrBegin() / isrYield() pair and how?

    I think it is, but since i have not yet written and tested code which
    does so, i cannot be certain. In any case, you must be extremely careful
    not to include anything else than firing an event (by a constant).
    If the interrupt handler is naked, this interrupt only sets one bit in
    the GPIO and this is an atomic operation which does not use any register.

    The point is, that you still have to wait until the current task is
    preempted and the high priority communication task is started.
    So this may take a full tick. Communication on the other side must
    take this dead time into account somehow. Maybe you can do this with
    CTS/RTS. Advantage is that it must be possible to get a baudrate
    of 115K2.

    Research in this matter is on my wish list.

    The other option is is to interrupt at every char that comes in.
    The downside of that approach is that there are a lot of context
    switches, slowing the communication down. Besides that, when the
    OS is not interruptable, the communication is stalled also. It
    would not surprise me if 9K6 or 19K2 are the maximum speeds
    achievable with this approach.

    What is difference between isrEndReturn() and isrEndYield() to
    make such difference in processing and difference in
    performance ( which is better for isrEndYield() in my opinion).

    http://www.femtoos.org/code_api.html#0402
    http://www.femtoos.org/code_api.html#0403
    So the isrEndYield() is generally faster since it does not go back
    to the task first but forces a context switch, at the cost of more
    overhead in total. Of course isrEnter()/isrExit() causes even less
    overhead, at the expense of being a little slower in the first
    response, and is only available when the OS is not interruptible.

    So, you see, whatever you choose it is a tradeoff.

    Which option  to choose?

    Use cfgSyncSingleSlot if within the use of a slot, you are not
    using an other slot. This would cause nesting, and requires more
    bookkeeping of the OS, thus generating larger code. If you use
    cfgSyncSingleBlock you may nest several locks. Of course, your
    code can only reach the deepest level if all other locks are
    obtained. Therefore it is at most one block per task at the time.
    You must make room for the nesting by defining the appropriate
    SlotSize. http://www.femtoos.org/code_config.html#1902
    Thus, if you are not calling a lock within an other lock,
    in a single task, you use cfgSyncSingleSlot with SlotSize_XYZ
    equal to one. I think you did this correctly.

    But this example does not use simultaneous blocking. Or is it?
    No, this is used for example if you want to pump data from one
    queue into an other, so you must lock both at the same time
    (atomically) This cannot be done sequentially, since this may
    cause sever deadlocks. I am proud of this solution, since in
    an other OS this usually means spin locking and a lot of
    performance loss. But you do not need it here.

    I know i have not answered all your questions yet, but for the
    moment, i hope this will be sufficient to get you rolling again.

     
  • Dragan Grujičić

    Hello Ruud,

    Again thank you for your time and kindness to answer my questions.

    Your explanation on locks is quite enough for now. Just one question: Is it safe to use cfgSyncSingleBlock in every place I would use cfgSyncSingleSlot (I do not know in advance if I will be using blocks inside blocks). This requires some more memory but should be safe?

    Now on UART and interrupts:

    Question: Is it really possible to make this code without isrBegin() / isrYield() pair and how?

    I tried to use only firing an event without isrBegin() / isrYield() and checked uart.s file in Standard directory:

        .section    .text.__vector_18,"ax",@progbits
    .global __vector_18
        .type   __vector_18, @function
    __vector_18:
    .LFB13:
    .LSM22:
    /* prologue: naked */
    /* frame size = 0 */
    .LSM23:
        sbi 62-32,0
    /* epilogue start */
    .LSM24:
    .LFE13:
        .size   __vector_18, .-__vector_18
    

    So I must be using only event as there is only one sbi instruction. Or not? I am not so familiar with asm files.
    But this code is breaking OS on the first character receiver over the line.

    Since I ran into problem with nonblocking tasks and isrEndYield() I found more than one problem with my code. I believe it is now correct but I would like to have this problem resolved or explained for following reason:

    So the isrEndYield() is generally faster since it does not go back
    to the task first but forces a context switch, at the cost of more
    overhead in total. Of course isrEnter()/isrExit() causes even less
    overhead, at the expense of being a little slower in the first
    response, and is only available when the OS is not interruptible.

    And this is true:

    My first test is always looping back FemtoOS Readme file through UART channel. I focused on 19200 baudrate, as lower baudrates are not loosing characters in any configuration. 38400 baudrate is good for keyboard input. At 19200 there are three combinations:
         * isrBegin() / isrYield() is making all characters correctly and is fastest, but breaks the code if there is nonblocking task in the application.

         *isrEnter()/isrExit() is next. I loose about 10 chars on whole file. And it is, as you said, making little overhead.

         *isrBegin() / isrReturn() is doing poorly.

    I realized that the main problem is not the size of the ISR, although is is something to count for, but the place we return in the OS. If we use isrYield() we go strait to UART_Rx_task and this was our intention from the start. If we use isrExit() or isrReturn() we return to interrupted task, as you clearly stated in your last post ( and I was not aware of this till recently) and this could make up to one tick of OS before switching to UART task. This is where I loose characters. Also I checked UART task as it is written now uses almost all registers so it is making expensive context switches. I will not use hardware with hardware flow control so for now I am focused on one interrupt per character.

    I can live with this as I said because I use mainly keyboard for inserting characters to UART channel, but I would like to make this working for other people.

    So bottom line: It would be great to know what is wrong (if there is something wrong) with isrYield() in combination with nonblocking tasks.

    After this I could move from UART completely. I started new things that are more interesting to me, but this UART problem is eating me (if you know what I mean).

    Best regards

     
  • De Vlaam

    De Vlaam - 2010-07-21

    This requires some more memory but should be safe?

    Yes it is, you must set the slotsize to a minimum of 2 for each task that uses cfgSyncSingleBlock. If you expect more levels of nesting you must increase the slotsize accordingly.

    So I must be using only event as there is only one sbi instruction. Or not?

    Yes this looks good.

    But this code is breaking OS on the first character receiver over the line.

    Probably this is because it takes to much time to start up the reading task. Note that all reading in this setup must be done inside a special high priority task without interrupts (UART interrupts must be switched off after the first character) You cannot just use your old code, it is completely different setup.

    So bottom line: It would be great to know what is wrong (if there is something wrong) with isrYield() in combination with  nonblocking tasks.

    Yes i see, but i cannot understand this phenomenon without experimenting myself. This is on my to-do list, but it competes with some other issues i am afraid. (porting to the xmega's for example) After i finished my current and paid(!) project (me and my family need to eat too) i will look into this matter. Probably that will be second half of august.

    I think the use (and popularity) of my OS would benefit from some ready to use drivers for uart, twi etc. So we do not all have to go through the painful cycle you are going through .. :-)

     
  • Dragan Grujičić

    This makes all the sense.

    I understand your point on OS braking on first received character and repeated interrupts. I noticed this and made all the changes yesterday. But it takes more then one instruction and uses one register. I tried what you call "dangerous hack" in your Interrupt example (with _signal_ attribute for arbitrary function) and I am getting it correctly. It does not improve performance as again there is problem with return point of interrupt.

    I think the use (and popularity) of my OS would benefit from some ready to use drivers for uart, twi etc

    I picked project that involves most of the MCU hardware so I need many drivers (UART, TWi, PWM, SPI, high speed interrupts, A/D conversion, ethernet…) so I will tickle all of them with little knowledge I have. So far UART and Ethernet are in focus. I have working Ethernet that is not thread safe, but can work in one task just fine. So far I have data acquisition and visualization application running on my PC (proprietary protocol and PC application, but could be used with any PC program that store and plot trend data).

    I understand you are doing some other things (we all do "strange things" for money) and wish you all the best. For me this is a hobby project, just to keep my mind occupied and in fit condition, so I have time to wait.

    As you I just wished there are more people working on the OS.
    So I would not bother you all the time. Again thank you for your time and help.

    Best regards
    Dragan

     

Log in to post a comment.