Porting Linux* Applications to 64-Bit Intel® Architecture

by Allan McNaughton


Introduction

As Linux* has gained mainstream respectability, it is being used to deploy more and more mission-critical applications. Unlike the early days, where only savvy developers recognized its suitability for corporate computing tasks, now CIOs and other high-level executives have recognized Linux’s outstanding value and are aggressively pushing it ever deeper into the data center.

The relationship between Linux and Intel® processors has long been symbiotic. Developers have found, that when paired, the Linux and IA-32 processors achieve outstanding levels of price/performance. This successful marriage now provides the perfect foundation to further Linux’s push into the enterprise. Existing IA-32 Linux applications can be quickly ported to take advantage of the performance and scalability offered by the 64-bit Intel® Itanium® processor.

Building on that momentum, software that has been code-cleaned for use on the Itanium® Processor Family can now run under 64-bit operating systems on processors that support Intel® Extended Memory 64 Technology (Intel® EM64T), generally with just a recompile. The limitations of these applications being able to run under Intel® EM64T largely concern assembly, intrinsics and macros. Details are available from the Industry Developer's Guides.


Getting Started

The task of porting an application to the Itanium Processor Family and/or Intel® EM64T is straightforward. After obtaining an Itanium-based system as a development environment and deciding upon suitable development tools, you must make your code 64-bit ready, regression test it, and then optimize your code for the platform.

Those who are interested in obtaining next-generation hardware for development should look into the Intel® Software Partner Program. A major benefit of the program is the availability of secure access to development servers over the Internet. For members who wish to have a system in-house, Intel and its co-travelers can provide hardware for short-term lease or purchase.

The level of effort involved in porting to 64-bit Intel® architecture varies widely, depending on the tools used to construct your application. If you are working in languages that do not compile to native 64-bit code, your code will most likely work as currently written. These codes include:

 

PERL*
PYTHON*
PHP*
M4*
AWK*
Java*
SQL
Shell scripts
C scripts

 

You will, however, need to obtain a version of the script engine that is compatible with the target processor(s) from your operating-system or toolset vendor.

Compiled languages such as C/C++ and FORTRAN will require additional porting effort, as they may contain 64-bit portability problems. These problems can be found quickly using one of the Intel® Compilers. Developers will find that the Intel® C++ Compiler for Linux* has excellent compatibility with GNU C/C++, strong C++ ABI conformance, wide gcc extensions support, and the ability to build the kernel with fewer modifications. It also supports Itanium microarchitecture features such as predication, speculation, branch prediction, and software pipelining. The Intel® C++ Compiler will provide Intel® EM64T support beginning with version 8.1.

FORTRAN programmers should look at the Intel® FORTRAN Compiler 7.0 for Linux. Intel® FORTRAN Compiler 7.0 for Linux*. It offers all the code optimization benefits of the C++ compiler (mentioned above), and it is compatible with common Linux development tools such as make, Emacs and gdb. The task of migrating applications from other platforms has been simplified with Intel’s compiler support for reading and writing Big Endian files. The Intel® Fortran Compiler will also provide Intel® EM64T support beginning with version 8.1.

Developers who wish to continue using the GNU C/C++ and FORTRAN compilers should check with their Linux vendor regarding porting issues from the 32-bit versions to the 64-bit versions of GNU tools.


Code-Cleaning Guidelines Provide the Porting Framework

Once you’ve chosen your development tools, you can continue with the porting process. The first step is to ensure that your source code is 64-bit 'code-clean'. This requirement means that your code must be modified to support the data model used by the Intel® architecture 64-bit operating environment, as compared to the data model used by IA-32 applications. Here are the basic facts about 64-bit data types:

 

Type

Size Alignment
char 1 1
short 2 2
int 4 4
long 8 8
long long 8 8
float 4 4
Double 8 8
long double 10 16
void* 8 8

 

Simply put, in the IA-32 data model, the fundamental data types int, long and pointer are each 32 bits in length. This is due to the natural 32-bit word size supported by IA-32 processors. The 64-bit Intel® architecture fundamentally differs, in t hat it has a natural word size of 64 bits. In 64-bit Intel® architecture, int is 32 bits, and long and pointer types are 64 bits. Because of the differences between data sizes, you are likely to encounter a number of issues during porting. Problems can be avoided by modifying your code to adhere to the guidelines discussed here.

Use ANSI const instead of #define Hex Constants

Using ANSI const, the compiler can perform the necessary type checking; you will get a warning if misuse is attempted. For example, consider the following code:

 

#define OFFSET1 0xFFFFFFFF

#define OFFSET2 0x100000000



In the 32-bit world, OFFSET1 is –1 and OFFSET2 is 0; in the 64-bit world, however, OFFSET1 is 4,294,967,295 and OFFSET2 is 4,294,967,296. This difference will likely cause an error unless it is identified and addressed during the porting process. To avoid the error, instead use ANSI’s type constant and properly qualify it with signed or unsigned:

const signed int OFFSET1 = 0xFFFFFFFF;

 

Guideline 1: Use Integers and Pointers Intelligently

There are four more things that you should be aware of in dealing with integers and pointers, their sizes, and alignment:

  • Don’t cast pointers to long or int. Use uint_ptr_t to do the casting.
  • Use sizeof() and offsetof() instead of hard-coded numbers to locate a piece of data from a structure.
  • Use variables with type size_t, which can be found in all current header files, especially for C Runtime library calls. In ANSI/ISO C/C++, all ints and longs have been replaced with the appropriate version of size_t. When in doubt, look at the header file, and ask yourself, “does this function return a size_t?” If so, declare a variable of size_t rather than type int.
  • Use explicit-sized types for external, on-the-wire, and shared memory structures. For example, instead of:

 

struct on_disk { int reclen; }


use

struct on_disk { int32_t reclen; }

 

Following are additional common examples. This code demonstrates variables receiving longs (before porting):

long l; //l is a 64-bit data type
int i; //i is a 32-bit data type
long func(long l); //func returns a 64-bit value
i = (int)l + func(l); //64-bit to 32-bit assignment

 

The following passage represents the same code as that immediately above after porting:

long l; //l is a 64-bit data type
size_t i; //i is a 64-bit data type
long func(long l); //func returns a 64-bit value
i = l + func(l); //64-bit to 64-bit assignment

 

The following code demonstrates variables receiving pointers (before porting):

char p; //&p is a 64-bit value
int i = &p //64-bit to 32-bit assignment

 

The following passage represents the code demonstration of variables receiving pointers, after porting:

char p; //&p is a 64-bit value
uintptr_t i = &p //64-bit to 64-bit assignment

 

Guideline 2: Watch for Packing, Padding and Alignment Issues.

Different data models have different packing, padding, and alignment rules. It is important to gain a clear picture of how your structures will lay out in a 64-bit system’s memory and to feel comfortable working with them.

If you are writing new software, or at least have this flexibility, sorting structures is a way to optimize for speed and to ensure that you do not have a conflict with the padding requirements of 64-bit Intel® architecture. Putting the larger, more highly aligned data objects in the front and the less-aligned objects toward the end results in better packing and higher efficiency in your structures. This improvement exists because the data can be found in the same cache line.

For example, the following structure is 12 bytes in the 32-bit world and 24 bytes in the 64-bit world:

 

struct dim_t { int height; long width; int weight; };

 

Reordering the fields, we can obtain the following code:

struct dim_t { __int3264 width; int height; int weight; };

 

The new structure is 12 bytes in a 32-bit world as before, but it is only 16 bytes in the 64-bit world, a 33% savings. Reordering the fields is a good weapon for fighting code bloat.

Furthermore, putting pointers together, shorts together, and chars together will improve packing. Although we have mentioned above that data are naturally aligned, this alignment can be overruled by #pragmas.


Additional Porting Guidelines to Bear in Mind

Finally, these five additional guidelines will smooth the course for developers as they port Linux applications to 64-bit platforms:

  • Watch for Truncation: In the 32-bit world, double, with 52 significant bits, can hold all the bits that are held in a long. In the 64-bit world, the precision of double is smaller than a 64-bit integer. This can be an issue if your application assigns a long to a double. If the loss of significance causes a problem, use long double or REAL*10 instead.
  • Use Portable I/O Format Specifies: In the 64-bit world, printf() and scanf() have been enhanced to support %p. Now you do not have to guess the size of a pointer or introduce some non-portable code such as “%08lX”. Use %ld if you know that you need to print a long data type. The following code represents the pre-porting state:

 

printf(“%x”,ptr_value); //%x is 32-bit and truncates the pointer 

 

The ported version is as follows:

 

printf(“%p”,ptr_value); //use %p to display the full pointer

 

  • Use Predefined Parameters with System Calls: Use API calls to get system parameters, and use predefined mnemonics in system include files. For example, use getpagesizek(), sysconf(_SC_SLK_TICK) and SEEK_END (not 2), INT_MIN (not -2147483648)
  • Remove Inline Assembler: Inline assembler is not supported by compilers for the Itanium processor. To continue using assembler, you must rewrite it into Itanium instructions and create a callable routine in its own source file. Moreover, certain limitations exist in Intel®64 support for assembler; these limitations are detailed in Intelr 64 and IA-32 Architectures Software Developer's Manuals. As compilers have become more and more sophisticated, speed-based justification for assembler is becoming weaker and weaker. If you want to perform some processor-specific operations, you should determine whether there is an equivalent standard runtime library to do the same thing. In short, inline assembly code is a porting hazard, and you should try to get rid of it where possible.
  • Listen to the Compiler: You will continue receiving warnings and errors from the compiler until 64-bit porting issues are resolved. The GNU and Intel compilers predefine the constant __LP64 so you can write code that is conditionally compiled (using #ifdef), depending on whether you are building for the IA-32 architecture or 64-bit Intel® architecture. The Intel compilers also offer the –Wp64 option, which produces additional diagnostics to facilitate 64-bit migration.

 


Final Steps: Test and Tune the Application

Once your code is 64-bit clean, it is time to run a thorough regression test to verify that the application is functionally equivalent to the IA-32 version. Next, you must recompile it for operation on 32-bit systems that support Intel® EM64T to verify that it is fully functional under IA-32e operating mode. After those tests are successfully completed, you can move into the optimization phase.

A useful optimization tool that developers should examine is the Intel® VTune™ Performance Analyzer which provides detailed "hot spot" analysis and graphs of code performance during execution. This optimization tool, more than any other, points to the places in code where the application is spending the majority of its time. This insight allows changes in critical sections of code to have maximum performance impact. The VTune environment will provide Intel® EM64T support beginning with version 7.2.

Typically, these problems occur in small loops. Because of the nature of the Itanium microarchitecture, it is frequently better to unroll small loops than to have them cycle. The VTune Performance Analyzer can help determine which loops will run faster by use of this technique.


Additional Resources

 

About The Author

Allan McNaughton is a patent holding technologist and veteran writer with more than fifteen years industry experience. He is the president of Technical Insight LLC*, a firm specializing in the composition of high-technology white papers. Mr. McNaughton is a frequent contributor to leading technology publications.


For more complete information about compiler optimizations, see our Optimization Notice.

Comments

's picture

Can you please elaborate on the statement,
... "This is due to the natural 32-bit word size supported by IA-32 processors". ...
Based on the Intel IA-32 Software Developer's manual I am of the impression that a `word' is 2 bytes and `doubleword' is 4 bytes. Why is the term `word' associated with 4 bytes in most articles?