Menu

#93 Stack overflow check reliability

v1.0 (example)
closed
nobody
5
2014-10-11
2014-09-02
ilgitano
No

I am using FreeRTOS 7.4.0, with IAR compilers for MSP430/ARM devices.

I've been having some trouble catching and diagnosing stack overflows in our codebase. After a close look at the code I realise the issue is tied to the memory allocation we're using which is the simplest model in heap_1.c; what I've found I think applies across the board, and can be fixed.

At the tail end of vTaskSwitchContext are two stack check macro calls:

    taskFIRST_CHECK_FOR_STACK_OVERFLOW();
    taskSECOND_CHECK_FOR_STACK_OVERFLOW();

I have configCHECK_FOR_STACK_OVERFLOW == 1 so the second check is disabled, as this is quite an expensive check in terms of time and memory. I would assume the first should be sufficiently reliable for our needs.

Two checks is in itself suspicious and shouldn't be needed but points the way. The problem is that the tskTCB and associated stack space allocation in prvAllocateTCBAndStack results in the tskTCB being immediately below the stack. In our ARM and MSP430 code the stack grows downward. The FIRST casualty of a stack overflow is then the current task control block. Since the task control block keeps pxStack/pxEndOfStack, these values are no longer trustworthy and the first stack overflow check mechanism is unreliable.

I've supplied a modified implementation for prvAllocateTCBAndStack which where possible will ensure that the TCB is allocated on the safe side of the stack (opposite the direction of growth); it additionally makes one less malloc call and so probably saves a few bytes of memory and a few clock cycles.

Any static code checks are not going to like the casting I suspect, but it's necessary to control the placement of the TCB relative to the stack.

/*-----------------------------------------------------------*/

/*
 * The first thing to die in a stack overflow using the heap_1 allocator is the current task control block, becuase the original
 * code places the tasTCB below the the stack; this hack ensures the tasks TCB is above the stack, out of harms way, to assist
 * in tracking stack overflow bugs. This puts the TCB on the safe side of stack growth if possible, Above the stack if
 * growing down or below if growing upwards. Additionally it saves one malloc.
 */
static tskTCB *prvAllocateTCBAndStack( unsigned short usStackDepth, portSTACK_TYPE *puxStackBuffer )
{
tskTCB *pxNewTCB = NULL;

        if ( puxStackBuffer == NULL )
        {
                puxStackBuffer = ( portSTACK_TYPE * ) pvPortMalloc( ( ( ( size_t ) usStackDepth ) * sizeof( portSTACK_TYPE ) ) + sizeof( tskTCB ) );
                if ( puxStackBuffer != NULL )
                {
                        #if ( portSTACK_GROWTH < 0 )
                        {
                                pxNewTCB = ( tskTCB * ) ( puxStackBuffer + usStackDepth );
                        }
                        #else
                        {
                                pxNewTCB = ( tskTCB * ) puxStackBuffer;
                                puxStackBuffer = ( portSTACK_TYPE * )( pxNewTCB + 1 );
                        }
                        #endif
                }
        }
        else
        {
                pxNewTCB = ( tskTCB * ) pvPortMalloc( sizeof( tskTCB ) );
        }

        if( pxNewTCB != NULL )
        {
                pxNewTCB->pxStack = puxStackBuffer;
                /* Just to help debugging. */
                ( void ) memset( pxNewTCB->pxStack, ( int ) tskSTACK_FILL_BYTE, ( size_t ) usStackDepth * sizeof( portSTACK_TYPE ) );
        }

        return pxNewTCB;
}

/*-----------------------------------------------------------*/

Discussion

  • Richard Barry

    Richard Barry - 2014-09-02

    Are you actually reporting a bug? If not please post your comments to the support forum where they can be discussed, and I can write some stuff about why it is done the way it is. There have been a few discussions on it before so you may be able to find the posts. If you are reporting a bug then feel free to re-open this tracker, but please be more explicit so I can understand.

     
  • Richard Barry

    Richard Barry - 2014-09-02
    • status: open --> closed-rejected
     
  • ilgitano

    ilgitano - 2014-09-03

    When using configCHECK_FOR_STACK_OVERFLOW == 1 and heap_1 malloc code, stack overflow checks don't catch stack overflows.

    The reason is that the ordering of mallocs in prvAllocateTCBAndStack results in the tasks control block being immediately below the stack. On MSP430/ARM the stack grows down, so the first casualty of a stack overflow is the tasks control block, destroying pxStack/pxEndOfStack.

    taskFIRST_CHECK_FOR_STACK_OVERFLOW cannot work.

     
  • Richard Barry

    Richard Barry - 2014-09-03

    Please post your question/comment to the support forum.

     
  • Richard Barry

    Richard Barry - 2014-09-05
    • status: closed-rejected --> open
     
  • Richard Barry

    Richard Barry - 2014-10-05
    • status: open --> closed
     
  • Richard Barry

    Richard Barry - 2014-10-05

    Which end of the stack the TCB is placed is now dependent on the value of portSTACK_GROWTH.

     

Log in to post a comment.

MongoDB Logo MongoDB