Menu

DynamicArray portability problem

Get Help
2015-04-01
2015-04-22
  • John Fisher

    John Fisher - 2015-04-01

    I have noticed that common/src/DynamicArray.c has a portability problem as coded. The address arithmetic for base and count does not take into account padding that can occur when defining structs.

    I have the following solution which is still not 100% portable, but should be close.

    Local to DynamicArray.c, define a struct:

    typedef struct
    {
        DynArray dynArray;
        void* base;
        Index count;
    } OuterDynamicArray;
    

    Then replace code such as:

    void** base = (void **)(dynArray + 1);
    Index* count = (Index*)(base + 1);
    

    with:

    void** base = &((OuterDynamicArray *)dynArray)->base;
    Index* count = &((OuterDynamicArray *)dynArray)->count;
    

    As you can see OuterDynamicArray has been written to match structs such as VxTable, except with a void * to replace for example the VxEntry *.

    The reason I say that this still isn't 100% portable is that C doesn't guarantee that void * pointer and other pointers have the same bit pattern, or even the same size. There must be an explicit assignment from one to the other to guarantee portability. I think an API change is necessary, eg

    void *createDynArray(DynArray* dynArray, size_t entrySize,
        uint16_t chunkEntries, errorCode* errorCode);
    

    and eg replace the call in encodeLn() bodyEncode.c

    strm->schema->uriTable.uri[qnameID->uriId].lnTable.dynArray.ln =
        createDynArray(
            &strm->schema->uriTable.uri[qnameID->uriId].lnTable.dynArray,
            sizeof(LnEntry), DEFAULT_LN_ENTRIES_NUMBER, &tmp_error_code);
    

    You would leave base out of OuterDynamicArray.

    I can't justify an API change for my own project. I suspect the number of architectures for which my first solution doesn't work is small and diminishing.

     

    Last edit: John Fisher 2015-04-01
  • John Fisher

    John Fisher - 2015-04-01

    I've fixed the formatting since I first posted

     

    Last edit: John Fisher 2015-04-01
  • Rumen Kyusakov

    Rumen Kyusakov - 2015-04-03

    Hi John,

    Thanks for the feedback! The code related to the dynamic arrays is indeed a muddy area. Here is my reasoning:
    All the portability problems are somewhat related to the following two lines as you also indicated:

    void** base = (void **)(dynArray + 1);
    Index* count = (Index*)(base + 1);
    

    You suggest that taking the base address as it is given in the first line creates issues due to (possible) padding after the DynArray dynArray; field in the structs such as VxTable. Can you please share a bit more details of your use case? At least according to my understanding "dynArray + 1" should take care of any padding as it returns the address that is offset by sizeof(DynArray) bytes and sizeof() includes any padding associated with the structure. I maybe missing something as this is a tricky C stuff - for ex. if the padding of DynArray itself is not enough for the required alignment of the base pointer field in VxTable and similar. Is there such a case?

    The second line from above '(Index)(base + 1)' could create problems if the sizeof(void) differs from the sizeof(ArrayEntryType*). As you also said this should be the case for extremely exotic hardware/compiler combination.

    Happy to hear your thoughts on that.

    Best,
    Rumen

     
    • Anonymous

      Anonymous - 2015-04-07

      Hi Rumen,

      In understanding this problem it is helpful to think about alignment requirements. Every object has an alignment requirement, and typically smaller objects have less restrictive alignment requirements than larger ones. A struct typically has the alignment requirement of it's most restrictive member. They aren't required to have the alignment of the most restrictive of the basic types. I think structs have to get padded out to match their own alignment requirement. They don't have to be padded out to all alignment requirements.

      To take the example of the architecture I am using, CHAR_BIT is 16. That means the machine cannot address individual octets, and the minimum addressable unit is 16 bits.

      The C standard uses the term "byte" for minimum addressable unit. In C standard usage of the word "byte", a byte on this architecture has 16 bits. Since in most people's minds byte means exactly 8 bits, I prefer to use the term that some of the architecture documentation uses, which is MADU. Absolutely key to understanding all of this is that in all C architectures, not just mine, sizeof returns the number of MADUs, which is not necessarily the number of octets.

      In this architecture, int and short are both 16 bits (1 MADU). long is 32 bits (2 MADUs). Individual objects are limited in size to 65536 MADUs, so Index and size_t are both 16 bits (1 MADU). The address space is much larger than required for any single object, and pointers require 2 MADUs.

      Therefore sizeof(Index) and sizeof(size_t) are 1. sizeof(DynArray) is 3. sizeof(void *) and sizeof(VxEntry*) are both 2. Pointers and longs must be aligned to a 2 MADU boundary. Therefore in the VxTable struct the compiler must pad with a MADU after the DynArray struct and before VxEntry* vx. offsetof(VxTable,vx) is 4, but sizeof(DynArray) is 3.

      Not that it would be useful to do so, but if I was to create an array of DynArray, each element would be 3 MADUs. The size of the array in MADUs would be 3 times the number of elements. No memory would be wasted. If the C standard required sizeof(DynArray) to be 4 MADUs on this architecture, then the element size would be 4 MADUs and memory would be wasted unnecessarily because nothing in DynArray is required to be aligned to a 2 MADU boundary.

      Casting from a pointer to one struct to another with the same common initial sequence is pretty safe I think. There is a special guarantee for unions.

      I hope that helps.

       
      • Rumen Kyusakov

        Rumen Kyusakov - 2015-04-15

        Hi John,

        Thank you for the time you spend to explain the matter in such a detail! I got much better understanding of the padding in structures and the possible portability issues that this can create. I was also unaware of the offsetof() utility - much handy indeed.
        I must say I am impressed that you managed to trace this issue. It must have been a real pain to have to find such an intricate bug.
        Thank you for the elegant solution you suggested as well! I will make sure to integrate it in the trunk soon.

        Regards,
        Rumen

         
  • John Fisher

    John Fisher - 2015-04-07

    Hi Rumen

    A struct isn't required to be padded out to the maximum alignment requirement of any of the basic types. I think it is only required to be padded out to its own alignment requirement. This is typically the maximum alignment of any of its members.

    In the case of the C implementation I am using we have:
    CHAR_BIT = 16
    sizeof(char) = 1
    sizeof(short) = 1
    sizeof(int) = 1
    sizeof(size_t) = 1
    sizeof(long) = 2
    sizeof(void *) = 2
    sizeof(VxEntry *) = 2
    offsetof(VxTable,vx) = 4
    sizeof(DynArray) = 3

    I use the term "byte" not as 8 bits, but as the C standard does.

    As can be deduced from the above, in this implementation a byte is 16 bits, the maximum size of any individual object is 64K bytes, but the address space is larger and a pointer requires 2 bytes, with a 2 byte alignment requirement.

    Casting pointers between structs with a common initial sequence is safe in my experience (so along as you only refer to the common members). In fact a guarantee is made for structs with a common initial sequence inside a union.

    As far as exotic architectures where bit patterns are different for different pointer types are concerned, I personally don't know of any left. However if the compiler writers had chosen differently, on this architecture a byte could have been 8 bits, at an increased cost in accessing arrays of char, but with halved memory requirements for those arrays. In this case void * and char * pointers would be different from int * pointers.

    Hope that helps

     
  • Rumen Kyusakov

    Rumen Kyusakov - 2015-04-15

    As I wrote above - many thanks for the detailed explanation!

    I think the solution you suggested should work for any system where the size of the pointers is the same (i.e. sizeof(void *) == sizeof(<any other="" pointer="" type="">) ) regardless of the fields in the structure that they point to.

    Best regards,
    Rumen

     
  • John Fisher

    John Fisher - 2015-04-22

    Hi Rumen

    As you can probably tell, I'm still getting used to this forum software. I forgot to login before posting anonymously above, and I got the impression my anonymous post was lost. So I wrote it again a bit more simply.

    As you probably realise the term MADU (according to proprietary documentation) is similar to the term byte (as defined by the ISO C standard). On reflection I realise that it's not the same because if a processor could directly address individual bits, then its MADU would be 1 bit, but a C implementation for that processor would define a byte to be 8 bits, because a byte must be big enough to hold a character.

    So far as debugging was concerned, I found it pretty quickly. I'm used to this sort of problem as EXIP isn't the first major library I have ported to the DSP I am using.

     

Anonymous
Anonymous

Add attachments
Cancel