CIL (C Intermediate Language) / Bugs / #130 Handling of array initialization is incorrect

#130 Handling of array initialization is incorrect

Milestone: Bug

Status: closed-fixed

Owner: nobody

Labels: CIL core (parser, visitor, etc) (96)

Priority: 5

Updated: 2012-06-12

Created: 2012-06-07

Creator: Ed Schwartz

Private: No

Consider an array, int x[50000] = {0, };

In 6.7.8, paragraph 21 of the C99 standard:

If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.

In other words, when in doubt, initialize to zero.

CIL does this, but only for up to 16 elements. If it has to put an implied zero for more than 16 elements, it does not put any implied zeros. This is wrong, and has lead to incorrect behavior on real code. Specifically, in card_of_complement from tr.c, in coreutils:

bool in_set[N_CHARS] = { 0, };

where N_CHARS = 256.

There seem to be a few ways of solving this problem:
* Remove the 16 element limit. This could add a lot of statements.
* Allow local variables to use initializers. Global variables are allowed initializers, but not local ones... is there any reason for this?
* Add a loop to initialize the array

I don't mind implementing one of these, but I want to make sure it's something worth including in the CIL codebase. I prefer the third option.

Discussion

Ed Schwartz - 2012-06-07

By the way, here is an example of how CIL hanhdles _Bool x[4] = { 0 }:

int f(void)
{
_Bool x[4] ;

{
#line 2
x[0] = (_Bool)0;
#line 2
x[1] = (_Bool)0;
#line 2
x[2] = (_Bool)0;
#line 2
x[3] = (_Bool)0;
#line 4
return ((int )x[1]);
}
}

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ed Schwartz - 2012-06-07

And here is how it handles _Bool x[4096] = { 0 };

int f(void)
{
_Bool x[4096] ;

{
#line 2
x[0] = (_Bool)0;
#line 4
return ((int )x[1]);
}
}

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gabriel Kerneis - 2012-06-08

I would gladly accept a patch. I prefer the second option, but it is not realistic to implement it given the current state of CIL (the reason why it was done that way is to be able to move every local variable to function scope).

So if you feel like coding the third option, please go ahead. A fourth and shorter solution would be to use __builtin_memset, but it wouldn't work in MSVC mode (unless there is some intrisic memset available there too).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ed Schwartz - 2012-06-08

Calling memset is a good idea.

VC++ does have intrinsics: http://msdn.microsoft.com/en-us/library/26td21ds\(v=VS.80).aspx

But, there are a few problems that I can see:

* They can only be enabled at the file level; so if someone was using memset and expected it to be a function call, that could be a problem if we needed to use the intrinsic.
* To use the intrinsics, one needs to include a header file which seems to cause problems: https://www.google.com/webhp?sourceid=chrome-instant&ie=UTF-8&ion=1#hl=en&sclient=psy-ab&q=intrin.h%20windows.h%20conflict&oq=&aq=&aqi=&aql=&gs_l=&pbx=1&fp=48739054e748cc81&ion=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&biw=1198&bih=1495
* In my tests, the memset intrinsic just calls memset in CRT anyway! It doesn't do any kind of inlining.

So, I guess I'll work on the third option.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ed Schwartz - 2012-06-08

Proposed patch

patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ed Schwartz - 2012-06-11

I attached a patch for this. Let me know if you have any comments..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gabriel Kerneis - 2012-06-11

Looks good to me. However, is there any reason to keep the first 16 initializers in collectInitializers since we add the loop in every case? I think we should get rid of it but I might be missing something. Or use a loop only for longer arrays, but I prefer avoiding "magic numbers". If you agree, I'll amend the patch, test it and apply it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ed Schwartz - 2012-06-11

Hi Gabriel,

The loop is currently only added when there are more than 16 implied initializers. I also don't like magic numbers, but it could simplify program analysis to have the unrolled initializers outside of the loop when there are only a few of them, so I see a case for both sides.

I do not feel strongly either way. Feel free to amend the patch in whatever way you think is best.

Thanks,
Ed

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gabriel Kerneis - 2012-06-12

In fact, your patch is incorrect is several places:

> The loop is currently only added when there are more than 16 implied initializers

no it's not (and that's one of the reasons why it breaks the test suite), But anyway I'll remove these 16-initializers trick.

> BinOp(Ge, Lval ctrlval, Const(CInt64(ni, IUInt, None)), uintType)

Binop(Ge, …) requires intType, not uintType.

> castTo iet (typeOfLval lv) (Const(CInt64(0L, IUInt, None)))

This one is the real problem. IIUC, you copy/pasted it from the SingleInit case, but it does not work. You could have an array of any type (bt), the cast is non-sensical and you have no guarantee that a direct init is possibe (it could be an array of struct containing an array, etc.). I think the solution is to call assignInit recursively with makeZeroInit bt, but I have to test it to be sure.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gabriel Kerneis - 2012-06-12

Committed an amended version of your patch with recursive calls to assignInit.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gabriel Kerneis - 2012-06-12

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.