By Robert Godley
When developing applications to be run on Opteron* X86-64 or Intel® Extended Memory 64 Technology (Intel® EM64T) platforms, developers have a choice of several brands of compilers. Both GNU and PGI compilers offer a switch that lets developers optimize object code for a specific platform's architecture. When the switch is set for a particular architecture, the compiler generates code that takes advantage of the architecture’s unique features.
Developers often wonder whether using the architecture-selection switch provides substantial improvement in the application's performance. This paper provides the answer, using a set of performance benchmarks compiled with the GNU and PGI compilers.
Data was collected based on a set of 52 benchmark programs. These include the following:
- 24 programs written in C
- 5 written in C++
- 23 written in Fortran
Among them are:
- 9 tests from www.netlib.org/performance/html*
- 7 from http://www.nas.nasa.gov/Resources/Software/npb.html *
- 23 from www.spec.org*
- 12 Intel proprietary tests
- 1 test from Bernt Arne Ødegaard (financial numerical recipes in C++)
Some of the benchmark programs would not successfully compile. For example, the GNU g77 compiler sometimes failed to compile a Fortran 90 program. The results outlined below are based on the subset of tests that successfully compiled and linked with each compiler.
The hardware test platform was a dual Intel® Xeon® processor system using 3.6 GHz Intel® processors, code-named "Irwindale," with Hyper-Threading Technology (HT Technology) turned off in the BIOS, 8 GB of RAM, and a 4-disk RAID-0 array for secondary storage. The operating system was Red Hat Linux*, version EL3-Update 3 for Intel EM64T. The same platform was used to compile the programs and to execute them.
Performance Computation Methodology
Each benchmark was compiled three times on each brand of compiler. The following table identifies specific switch combinations used for each compilation.
Six Unique Compilations for Each Benchmark
|GNU||-O switch, letting the compiler use the default setting for the architecture-selection switch|
|-O switch and the Opteron setting for the architecture-selection switch|
|-O switch and the Intel processor code-named "Nocona" setting for the architecture-selection switch|
|PGI||-O switch, letting the compiler use the default setting for the architecture-selection switch|
|-O switch and the K8-64 setting for the architecture-selection switch|
|-O switch and the P7-64 setting for the architecture-selection switch|
For each benchmark, the executable built from each compilation and switch setting was executed three times. An average execution time was derived from these three execution times.
The data was then examined from another direction. For each of the six combinations of switch settings and compiler brands, the geometric mean of these average execution times was computed for each successfully compiled benchmark. In other words, from all benchmarks compiled successfully on the GNU compilers, the geometric mean execution time for those with only the –O switch setting was computed. Then the process was repeated for each of the other two switch combinations. The process was repeated again for the benchmarks compiled with the PGI compilers. This yielded a set of six performance numbers - one for each row in the preceding table.
The GNU Version 3.4.2 compilers were used for this test. The GNU compilers offer the -march switch to specify the intended platform architecture. The choices are –march=nocona and –march=opteron. The GNU documentation states that, when the platform architecture is not specified, the optimization defaults to Opteron architecture.
Each benchmark was compiled three times with GNU compilers, once for each of these combinations of switches:
- -O –march=opteron
- -O –march=nocona
The GNU compilers successfully compiled and linked 38 of the benchmarks with these sets of switches. As expected, there were no performance differences between the –O and the -O –march=opteron cases. However, when the architecture-selection switch was set to -march=nocona, performance improved by 17%. This chart shows the performance results:
(Click image for larger version)
The PGI Version 5.2.4 compilers were used for this study. The PGI compilers offer the –tp k8-64 and –tp p7-64 switches to specify the Opteron and Intel architectures, respectively. The PGI documentation states that in the absen ce of the –tp switch, the compiler defaults to the type of CPU on which it is executing. Since these tests were compiled on an Intel platform, the Intel architecture was the default.
Each benchmark was compiled three times with PGI compilers, once for each of these combinations of switches:
- -O –tp k8-64
- -O –tp p7-64
The PGI compilers successfully compiled and linked 44 of the benchmarks with these sets of switches. As expected, there were no differences between the –O and –O –tp p7-64 cases. Both these cases executed the benchmarks 10% faster than the case where the architectural-selection switch was set to the Opteron microprocessor. This chart shows the performance results:
(Click image for larger version)
When compiled on these GNU or PGI compilers, the benchmarks running on 64-bit Intel Xeon processor-based platforms run substantially faster when the architecture-selection switch specifies Intel architecture.
The set of benchmarks assembled for this study are examples of High-Performance Computing programs, and their overall performance on a 64-bit Intel Xeon processor-based platform was significantly slower when they were built with the architecture-selection switch set for Opteron.
If application builders are using either of these compilers, they should set the architecture-selection switch to specify the intended target. For applications that will be run in a mixed-architecture environment, developers should build and test performance using both architectural switches. Developers should then choose the switch setting that provides the best overall performance for the environment in which their application will run.