Short Data Segment Overflow Error on Linux* 64 on Itanium® Architecture

by Seung-Woo Kim

Abstract

It is possible to produce a short data segment overflow link error on Intel® Itanium® Architecture on Linux* 64 platforms when building very large static images. This problem exists on Windows* 64 and with Microsoft and Intel compilers (C/C++ and FORTRAN). The Itanium processor runtime architecture conventions require that there be only 1 short data segment in every load module. Further, current implementation limits the size of short data segments to 4MB since memory references are calculated using 22-bit immediate add instruction sequence. The linker error is caused by excessive data in the short data segment, overflowing the 4MB size limit. The problem gets worse when debug option (-g) is used because this creates additional temporary data and labels necessary for debugging. This report documents a class of such problems that are addressed by the Intel compilers (l_cc_pu_6[1] .0.159 for C/C++, l_fc_pu_6[1].0.163 for FORTRAN) as well as an outline of a longer-term solution scheduled in mid-2003.

Table 1 lists the types of program segments defined by the runtime architecture on the Itanium processor family.

Segment Type Sharable Quantity Address by Contents
Text Yes 1 per load module IP or linkage table Text, unwind information, constants and literals
Short Data No 1 per load module gp Static Data, bss, linkage tables
Long Data No Any linkage table Long data, bss
Heap No Any pointer Heap data
Stack No 1 per thread sp Memory stacks
Backing Store No 1 per thread bsp Back store for register stacks
Thread Data No 1 per thread tp Thread-local storage

 

Shared Data Yes Any pointer Shared Memory
Table 1 Program Segments

 

To make the most effective use of the addressing modes available on Itanium architecture, each load module's data is partitioned into one short data and a number of long data segments. The short data segment, addressed by the gp register in each load module, contains the following areas:

  • A linkage table, containing pointers to imported data symbols and functions, as well as to data in the text and long data segments.
  • A short data area, containing small initialized "own" data items.
  • A short bss area, containing small, uninitialized "own" data items.

 

"Own" data items are those that are either local to a load module, or are such that all references to these items from the same load module will always refer to these items, i.e., they are not subject to being overridden by an exported symbol of the same name in another load module. All data items in the main program satisfy this definition, since the main program is always the first load module in the binding sequence. Since non-"own" variables cannot be referenced directly, there is no benefit to placing them in the short data or bss area. The long data segments contain either or both of the following areas:

  • A long data area, containing large initialized data items, and initialized non-"own" data items of any size.
  • A long bss area, containing large uninitialized data items, and uninitialized non-"own" data items of any size.

 

Linkage table entries are typically allocated by the linker in response to a relocation request generated by the compiler. An entry in the table is either 8-byte pointer to a data item, or a 16-byte function descriptor. Because everything must be addressable via the gp register using the 22-bit add immediate instruction, there is a maximum of 4MB size limitation for the short data segment.


Limitation of the Short Data Segment and a Reproducer

There are basically 2 ways to overflow the 4MB size limit of short data segment.

  • Case 1: Declare more than 4MB of global variables in the load module, overflowing short data area or short bss area.
  • Case 2: Overflow the linkage table. A linkage table larger than 4MB will overflow the short data segment, because the linkage table is in short data segment.

 

It is easy to make a reproducer of each problem that tries to create 4MB data (or linkage table) and fill up the short data segment. Figure 1 is an excerpt of the reproducer that causes linker error for the first case. Each function puts 32 bytes to short data segment. Because 128 * 1024 functions are declared, there are total of 4MB in short data segment.


Figure 1 Excerpt of the Reproducer - case 1

The result of the compilation is as follows:

  • ecc -O0 -static *.c -Xlinker -noinhibit-exec => link successfully
  • ecc -O0 -g -static *.c -Xlinker -noinhibit-exec => short data segment overflow
  • ecc -O2 -static *.c -Xlinker -noinhibit-exec => link successfully
  • ecc -O2 -g -static *.c -Xlinker -noinhibit-exec => link successfully

 

The first case is successful because the short data segment is barely within the 4MB limit. The second case fails because -g option creates additional data and overflows the limit. 3rd and 4th case links successfully because -O2 throws away unused variables leaving enough space in short data segment even when -g option is used.

Figure 2 is an excerpt of the reproducer for case 2. "typedef.h" declares a struct_type that has 1024 fields. Func_i.c has a function definition func_i() that declares a variable of type struct_type and reference each field of the variable. Because the variable is an aggregate type, it goes to long data segment. However, the Intel compiler treated each field as a separate variable and all 1024 pointers to the fields are added to the linkage table. Because each pointer takes 8 bytes in the linkage table and there are 512 such functions, it overflows the linkage table (8 * 1024 * 512 = 4MB.) In the case of FORTRAN, each COMMON block is treated the same way as struct, causing the same problem.


Figure 2 Excerpt of the Reproducer - case 2

Similar reproducer for FORTRAN is included in the appendix.


Short Term Solution

The most commonly reported problem comes from the case 2 written in FORTRAN (a statistically large sample of problem reports is absent; problem reports are anecdotal). To address the issue, the code generation was modified in such a way that only the base pointer of each struct_type variable is added to the linkage table. The address of a field in the variable is calculated by adding the offset of the field to the base address, reducing the entry in the linkage table to 4KB (8 * 1 * 512) for the reproducer in Figure 2. Admittedly this is a temporary measure, and the real small-data overflow hard limit is still present. The reproducer in Figure 2 was recompiled with a newer version of C/C++ compiler (l_cc_pu_6[1].0.159), and it was successful in compiling and linking the reproducer. FORTRAN compiler l_fc_pu_6[1].0.163 has the corresponding fix, and it was verified with the help of the reproducer in the appendix.


Long Term Solution (Huge memory model)

In this section, we discuss the Huge Memory Model that ensures proper addressing of short data segment overflow situations. Huge memory model is described in the Itanium runtime conve ntions guide, but it is not yet implemented for the compilers for Itanium architecture. This model uses the 64-bit movl instruction followed by full 64-bit add instruction sequence to form the address to reference, increasing the offset distance to 64-bit width, instead of 22-bit width. In order to implement this model, the compiler is required to generate 3 instruction sequences, and the linker to implement a different relocation type. The performance implication is as follows:

  • Traditional short data: uses 2 instruction sequence for speedy data, directly accessing data by base+displacement.
  • Huge-memory model: uses 3 instruction sequence for medium speed data, directly accessing data by base+displacement
  • Traditional long data: uses 3 instruction sequence for slow data, indirectly accessing data by double-load sequence

 

A small reproducer was made to test if both Microsoft and GNU linkers implemented this relocation type by directly editing the assembly file to simulate code generated for the huge memory model. Figure 3 shows the original source code. It was compiled with "ecc -O0 -S test.c" and modified to simulate huge memory model. The relocation records of test.o show that GPREL64I relocation type is used for sdataitem, and the compiled code ran correctly. It is verified that both linkers can handle this type of relocation.

Figure 3 Huge Memory Model Reproducer & Modification

It is verified that both linkers can handle this type of relocation.


Summary

The short data segment overflow linker error is based on a hard limit on the size of the short data segment in the current compiler implementation of Itanium architecture runtime architecture. This report described what is causing the problem, as well as a class of problems that are fixed by the Intel compiler l_cc_pu_6[1].0.159. Subsequent FORTRAN compilers after l_fc_pu_6[1].0.163 was able to solve the reported problems at hand. For the longer-term solution, huge memory model was described as well as its feasibility study for implementation.


Appendix with code samples

C/C++ Reproducer for Short-Term Solution

The following reproducer creates a set of functions that cause short data segment overflow error due to linkage table overflow. This program takes 3 arguments - number of fields in the struct, number of the struct variables per function, and the number of functions. The reproducer automatically generates, compiles, and links all the code necessary to cause the linker error on the old compilers. Trying to allocate 4MB or more in the linkage table will overflow the short data segment. An example is "genprog 1024 1 512". A pointer to each field takes 8 bytes, there are 1024 fields per variable, one variable per function, and 512 functions total, amounting to 8 * 1024 * 512 = 4MB in the linkage table. The newer compiler l_cc_pu_6[1].0.159 or later links successfully.

#include "stdio.h"

#include "stdlib.h"


main(int argc, char *argv[])

{

int i,j,k;

FILE *file;

char buf[10];

int nField, nVar, nFunc;


nField = atoi(argv[1]);

nVar = atoi(argv[2]);

nFunc = atoi(argv[3]);


file = fopen("typedef.h", "w");

fprintf(file, "typedef struct 

{

");

for(i=0;i<nField;i++)

fprintf(file, " double a%d;", i);

fprintf(file, "} struct_type;");

fflush(file);

fclose(file);


for (i=0;i<nFunc;i++)

{

sprintf(buf, "func_%d.c", i);

file = fopen(buf, "w");

fprintf(file, "#include "typedef.h"");

fprintf(file, " double func_%d()

{", i);

for (j=0;j<nVar;j++)

fprintf(file, " static struct_type var_%d_%d;", i,j);

fprintf(file, " double result = 0;");

for (j=0;j<nVar;j++)

for(k=0;k<nField;k++)

{

fprintf(file, " var_%d_%d.a%d = %d;
", i, j, k, k);

fprintf(file, " result += var_%d_%d.a%d;", i,j,k);

}

fprintf(file, " return result;}");

fflush(file);

fclose(file);

}


file = fopen("main.c", "w");

fprintf(file, "#include "stdio.h" #include "typedef.h"");

for (i=0;i<nFunc;i++)

fprintf(file, "extern double func_%d();", i);


fprintf(file, "main()
{");

fprintf(file, " int i;");

fprintf(file, " double sum = 0;");

for(i=0;i<nFunc;i++)

fprintf(file, " sum += func_%d();", i);


fprintf(file, " printf("result = %%f", sum);
");

fprintf(file, "}");

fflush(file);

fclose(file);


for(i=0;i<nFunc;i++)

{

sprintf(buf, "ecc -O0 -c func_%d.c", i);

system(buf);

}


system("ecc -O0 -c main.c");

system("ecc main.o func_*.o");

}


 

Fortran Reproducer for Short-Term Solution

The following reproducer for FORTRAN creates a set of functions that cause short data segment overflow error due to linkage table overflow. This program takes 3 arguments - number of fields of COMMON block, number of COMMON blocks per file, and the number of files. The reproducer automatically generates, compiles, and links all the code necessary to cause the linker error on the old compilers. Trying to allocate 4MB or more in the linkage table will overflow the short data segment. An example is "genprog 128 128 256". The newer compiler l_fc_pu_6[1].0.163 or later links successfully.

#include "stdio.h"

#include "stdlib.h"


void createField(FILE *file, int nLine)

{

int i,j;


fprintf(file, " double precision");

for(i=0;i<nLine;i++)

{

fprintf(file," &");

for(j=0;j<7;j++)

fprintf(file,"%c%d,",'a'+i,j);

if (i==nLine-1)

fprintf(file,"%c%d",'a'+i,7);

else

fprintf(file,"%c%d,",'a'+i,7);

}

}


void createCB(FILE *file, int iFile, int iFunc, int nLine)

{

int i,j;


fprintf(file, "common /var_%d_%d/", iFile, iFunc);

for(i=0;i<nLine;i++)

{

fprintf(file," &");

for(j=0;j<7;j++)

fprintf(file,"%c%d,",'a'+i,j);

if (i==nLine-1)

fprintf(file,"%c%d",'a'+i,7);

else

fprintf(file,"%c%d,",'a'+i,7);

}

}


void createAssign(FILE *file, int nLine)

{

int i,j;


fprintf(file," double precision result");

fprintf(file," result = 0.0");

for(i=0;i<nLine;i++)

{

for(j=0;j<8;j++)

fprintf(file," %c%d = %d.0", 'a'+i, j, j);

}

}


void createAdd(FILE *file, int nLine)

{

int i,j;


for(i=0;i<nLine;i++)

{

for(j=0;j<8;j++)

fprintf(file," result = result + %c%d", 'a'+i, j);

}

}


void createFunc(FILE *file, int iFile, int nField, int nLine, int iFunc)

{

int i;

char buf[100];


fprintf(file," double precision function sub_%d_%d()",iFile, iFunc);

createField(file, nLine);

createCB(file, iFile, iFunc, nLine);

createAssign(file, nLine);

createAdd(file, nLine);

fprintf(file," sub_%d_%d = result", iFile, iFunc);

fprintf(file,"return end");

}


void createSum(FILE *file, int iFile, int nFunc)

{

int i;


fprintf(file," double precision function sum_%d()", iFile);

fprintf(file," double precision result");

fprintf(file," result = 0.0");

for(i=0;i<nFunc;i++)

fprintf(file," result = result + sub_%d_%d()", iFile, i);

fprintf(file," sum_%d = result", iFile);

fprintf(file," return");

fprintf(file," end");

}


void createFile(FILE *file, int iFile, int nField, int nLine, int nFunc)

{

int i;


for(i=0;i<nFunc;i++)

{

createFunc(file, iFile, nField, nLine, i);

}

createSum(file, iFile, nFunc);

}


void createMain(int nFile, int nFunc)

{

FILE *file;

int i,j;


file = fopen("main.f", "w");

fprintf(file," program main");

fprintf(file," double precision sum");

for(i=0;i<nFile;i++)

fprintf(file," double precision sum_%d",i);


fprintf(file," sum = 0.0");

for(i=0;i<nFile;i++)

fprintf(file," sum = sum + sum_%d()",i);

fprintf(file," write(*,'("result = ",f12.1)')sum");

fprintf(file," end");

fflush(file);

fclose(file);

}


void compile(int nFile)

{

int i;

char buf[100];


for(i=0;i<nFile;i++)

{

sprintf(buf, "efc -c comp_%d.f", i);

system(buf);

}


system("efc -c main.f");

system("efc main.o comp_*.o");

}


main(int argc, char *argv[])

{

int i,j,k;

FILE *file;

char buf[10];

int nField, nLine, nFunc, nFile;


nField = atoi(argv[1]);

nLine = nField / 8;

nFunc = atoi(argv[2]);

nFile = atoi(argv[3]);


for(i=0;i<nFile;i++)

{

sprintf(buf, "comp_%d.f", i);

file = fopen(buf,"w");

createFile(file,i,nField, nLine, nFunc);

fflush(file);

fclose(file);

}


createMain(nFile, nFunc);

compile(nFile);

}


 


For more complete information about compiler optimizations, see our Optimization Notice.

Comments

The Intel compilers version 11 for IA-64 contain a provisional implementation of the huge memory model discussed above. If you encounter a short data section overflow, you may try building your entire application with the options -mcmodel large -shared-intel. (It's important to use a consistent memory model for the whole application).

Please report any problems to Intel Premier Support.