Element-wise Alignment Requirements for Data Accesses to be ABI-Compliant on the Intel® MIC Architecture

 

Compiler Methodology for Intel® MIC Architecture

Unlike the IA-32 and Intel® 64 architectures, the Intel® MIC Architecture requires all data accesses to be properly aligned according to their size, otherwise the program may behave unpredictably. For example, an integer variable, which requires four bytes of storage, has to be allocated on an address that is a multiple of four. Likewise, a double-precsion floating point variable or a pointer variable, which requires eight bytes of storage, has to be allocated on an address that is a multiple of eight. Structures and unions assume the alignment of their most strictly aligned component. Each member is assigned to the lowest available offset with the appropriate alignment. The size of any object is always a multiple of the object's alignment. See the discussion of alignment in the section "Data Access Operations" in the article Intel® Xeon Phi™ Coprocessor Vector Microarchitecture for more information.

Note that removing the misaligned accesses on IA-32 and Intel® 64 architectures (through appropriate source changes) will likely lead to improved performance there too.

Refer to the  following ABI document for more details: ABI document System V Application Binary Interface K1OM Architecture Processor Supplement, 500K

1. Here is a Fortran example that is not ABI-compliant on the Intel® MIC Architecture - note the use of the sequence keyword inside an object.

Consider the following structure:

type, public :: GridEdge_t
    sequence
    integer :: head_face ! needed if head vertex has shape (i.e. square)
    integer :: tail_face ! needed if tail vertex has shape (i.e. square)
    integer :: head_ind !
    integer :: tail_ind !
    type (GridVertex_t),pointer :: head ! edge head vertex
    type (GridVertex_t),pointer :: tail ! edge tail vertex
    logical :: reverse
end type GridEdge_t

Adding up the sizes of the individual fields, the size of this object is 36 bytes. Since the sequence keyword is used, the fields are contiguous in memory. If we created an array of these objects, the array elements are packed without padding bytes. So after the first element, subsequent elements would no longer be aligned when trying to access the fields head or tail. According to the ABI requirements, the fields head and tail should be 8-bytes aligned, so alignment of a GridEdge_t should be 8 bytes, and sizeof GridEdge_t should be a multiple of 8, viz. 40. If the SEQUENCE keyword is removed, the compiler automatically creates GridEdge_t with the correct size of 40 bytes.

2. Here is a simple synthetic example in C that violates the ABI:

#include <malloc.h>
int main(int argc, char **argv)
{
    char *blob = (char *)malloc(100); // malloc returns 8-byte aligned pointer
    float *ptr = (float *)(blob + argc); // Assume program is invoked with no arguments, argc=1

    for (int i = 0; i < argc; i++) {
        ptr[i] = 0; // GP fault here since floating point data is not aligned at 4-bytes
    }
    return 0;
}

This kind of access violation may happen from a user-written memory allocation routine. It's not uncommon for users to write their own memory allocation routines, which could inadvertently result in unaligned allocated memory.  This can lead to runtime errors due to the ABI requirements on the Intel® MIC Architecture and should be fixed by the user by making appropriate changes in the source code.

Use of "#pragma pack" in C (or C++) may also lead to ABI violations.

For more complete information about compiler optimizations, see our Optimization Notice.