1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Undefined behaviour

From cpwiki

Jump to: navigation, search

Contents

Introduction

Computers (and many other things) work best if there are strict rules defining exactly what the consequence of some particular operation is. This means defined behaviour is a good thing.

Undefined behaviour is where we no longer have a defined behaviour from the code. The C and C++ standards define the behaviour from constructs within the C and C++ language.

Why is there undefined behaviour

There are situations where the language standard allows the behaviour to be undefined. The common reason for this is that there are conflicting behaviour for different systems. If for example we REQUIRE that an invalid memory access causes a access violation, then some systems would not be able to provide a C compiler that follows the standard on that point.

There are many other reasons that the standards leave some things to be defined by the implementation, such as OS defined behaviour, or things that behave differently on different processor architectures. Undefined behaviour can be undefined to allow compiler vendors to solve a problem in different ways on different processor architectures, or because to define the behaviour precisely would restrict the performance of certain operations.

What happens when you "use" undefined behaviour

Many things can happen when code that provoke undefined behaviour is executed. That's exactly the problem - it's undefined - anything (including exactly what was expected, whatever that may be) can happen. The most common scenarios are that the program crashes or shows an unexpected/incorrect result.

Typical things that cause undefined behaviour

Uninitialized variable value

A very typical undefined behaviour is the value of a variable local to a function which isn't initialized. Say we have a function:

void foo()
{
	int x;
	std::cout << "Value of x: " << x << std::endl;
}

There is no way to determine what value x would have in this piece of code when it's executed. It may be zero, one, twelve, or it may be 4121829 - or any other value that an integer can have on the particular machine that this code was compiled for. It may even change depending on the code that calls foo(), or what input data was given in some other function, etc, etc. It is easy to avoid this particular undefined behaviour by always initializing all variables to a sane value (such as zero). Some compilers will warn if unitialized variables are used.

Pointers to freed memory

Another common undefined behaviour is to use pointers after the memory has been freed. This is very often not showing up when it happens a short while after the delete call, but makes things go wrong later - this in means that it's hard to debug the problem. Always setting pointers to zero (NULL) after freeing them reduces the risk of accidentally continuing the use of a pointer.

Uninitialized pointers

A pointer variable that has not been set to point to a valid memory section is also a "undefined behaviour". This will almost always lead to a crash in a modern OS, but in older OS's where memory protection isn't quite as good (if it exists at all) may lead to just strange behaviour.

Buffer overruns / out of bounds access

Writing data beyond the end of an array or allocated block of memory (buffer overrun) is another frequent variant of undefined behaviour. It can cause just about any effect, but a common variant is that the system crashes. However, writing beyond the end of a piece of memory allocated by for example new or malloc is not guaranteed to cause a crash by any means. Since the way that virtual memory works, it's only guaranteed to cause a access violation when the memory is not valid to access - which may be at the end of the 4KB memory page. For more information, see buffer overruns.

Note that in C++ vectors, an out of bounds access (that is, an access to an element which index is greater or equal to the vector's size) with an operator[] will lead to "undefined behaviour", whilst the at() member function will throw an exception when an out of bounds access.

Compiler dependant behaviour

There are some constructions that are less obvious:

void bar()
{
	int x = 3;
	std::cout << "Value of x: " << x++ + x++ << std::endl;
}

The value printed here is undefined, because we can't determine how many of the x++ is performed before printing the value - it is likely to be at least 6, but not certain.

Benign undefined behaviour

What I would call benign undefined behaviour are situations where nothing goes actually wrong, other than "cosmetics", but the behaviour is still not, technically, defined. A typical example is:

#include <stdio.h>

void bar()
{
    int x;
    printf("Enter an integer: ");
    scanf("%d", &x);
}

In this case, the printf() function may not actually output anything and scanf() starts requesting data from the user without anything to indicate the need to provide some data - the application looks stuck. But the data is not going missing, it's just not printed YET. Of course, we will NEVER get a crash from this. A fix for this particular point is to add fflush(stdout) after the printf to force any not yet printed data to be forced out.

Do not rely on undefined behaviour

If you find that your application is using some undefined behaviour, then you should fix it. Relying on undefined behaviour, even if it happens to work right now, is like relying on no cars coming the other way at a road-junction - it may work most of the time if it's not a busy road, but sooner or later you will be hit by someone coming the other way.

Some forms of undefined behaviour works fine until you change the compiler, because different compilers will treat a particular form of undefined behaviour in a different way.

Personal tools