| February 4, 2010 11:30 AM PST | |
HPCC Application Note
Step 1 - Overview
This guide is intended to help current HPCC users get better benchmark performance by utilizing Intel® Math Kernel Library (Intel® MKL).
HPCC stands for High Performance Computing Challenge benchmark and is actually a suite of benchmarks that measure performance of the CPU, memory subsystem and interconnect. It consists of 7 benchmark tests - HPL (High Performance LINPACK), DGEMM (Double-precision GEneral Matrix-Matrix multiply), STREAM, PTRANS (Parallel TRANSpose, Random Access, FFT (Fast Fourier Tranform and communication bandwidth/latency.
Please find more information on HPCC from: http://icl.cs.utk.edu/hpcc/* .
Version Information
This application note was created to help users who benchmark clusters using HPCC to make use of the latest versions of Intel MKL on Linux platforms on Xeon systems. Please note that previous versions of MKL may require other steps to successfully compile and link with HPCC.
Step 2 - Downloading HPCC Source Code
The HPCC source code can be downloaded from: http://icl.cs.utk.edu/hpcc/software/index.html*.
Prerequisites
1. Intel MKL contains highly optimized FFT and also the wrappers for FFTW, which can be obtained from the following options:
• Download a FREE evaluation version of the Intel MKL product.
• Download the FREE non-commercial* version of the Intel MKL product.
All of these can be obtained at: Intel® Math Kernel Library product web page.
Intel® MKL is also bundled with the following products
Step 3 - Configuration
Use the following commands to extract the HPCC tar files from the downloaded hpcc-x.x.x.tar.gz.
The above will create a directory named hpcc-x.x.x
Make sure that MPI, C++ and FORTRAN compilers are installed and they are in PATH. Also set LD_LIBRARY_PATH to your compiler (C++ and FORTRAN), MPI, and MKL libraries.
Step 4 - Building HPCC
• Build MPI MKL FFTW library.
Change the directory to <your MKL installation>/interfaces/fftw2x_cdft.
From the fftw2x_cdft directory, run the following command:
Here we are building for Intel64 architecture with Intel MPI (default for Makefile, you may use a different mpi), with Intel compilers, DOUBLE precision and ilp64 interface. This will create the MKL MPI FFTW interface library libfftw2x_cdft_DOUBLE_ilp64.a in lib/intel64 directory.
Note: Please note that by setting the interface parameter to ilp64 we require to build the FFTW MPI wrappers which admit 64-bit parameters in their interface to match the calls from HPCC. These 64-bit aware wrappers are not to be used with usual applications complying with traditional FFTW interfaces. Please execute $make to see the full set of options.
• Build FFTW C wrapper library
Change the directory to <your MKL installation>/interfaces/fftw2xc.
Then build the FFTWC wrapper by running the command as below
This will create libfftw2xc_intel.a library in <your mkl installation>/lib/intel64 directory
• Build HPCC
Change directory to hpcc-x.x.x/hpl
Create a Makefile from the existing one, for e.g. Make.intel. You can reuse one from the hpl/setup directory.
Edit Make.intel as follows: modify the LAdir, LAlib lines as below to point to MKL libraries.
Please make sure to following compiler options on the compile line:
Build HPCC by using
This will create an executable with name hpcc in the hpcc-x.x.x directory and a file _hpccinf.txt which is a template input file for hpcc. Rename the file to hpccinf.txt.
Step 5 - Running HPCC
Modify the configuration parameters in hpccinf.txt file.
Run hpcc by executing the following command.
hpccinf.txt is the same as standard hpl input file with a few additional lines. Please refer our HPL application note on tuning parameters in the configuration file.
Appendix A - Performance Results
Below are the hpcc benchmark results of Intel Endeavor cluster which can also be found in hpcc website*.
Version: 1.4.1.b - Run Type: base
Created: 2010-11-01 - Exported: Thu Mar 17 06:32:04 2011
Appendix C - References
• Intel Xeon Processor based Servers Homepage
Step 1 - Overview
This guide is intended to help current HPCC users get better benchmark performance by utilizing Intel® Math Kernel Library (Intel® MKL).
HPCC stands for High Performance Computing Challenge benchmark and is actually a suite of benchmarks that measure performance of the CPU, memory subsystem and interconnect. It consists of 7 benchmark tests - HPL (High Performance LINPACK), DGEMM (Double-precision GEneral Matrix-Matrix multiply), STREAM, PTRANS (Parallel TRANSpose, Random Access, FFT (Fast Fourier Tranform and communication bandwidth/latency.
Please find more information on HPCC from: http://icl.cs.utk.edu/hpcc/* .
Version Information
This application note was created to help users who benchmark clusters using HPCC to make use of the latest versions of Intel MKL on Linux platforms on Xeon systems. Please note that previous versions of MKL may require other steps to successfully compile and link with HPCC.
Step 2 - Downloading HPCC Source Code
The HPCC source code can be downloaded from: http://icl.cs.utk.edu/hpcc/software/index.html*.
Prerequisites
1. Intel MKL contains highly optimized FFT and also the wrappers for FFTW, which can be obtained from the following options:
• Download a FREE evaluation version of the Intel MKL product.
• Download the FREE non-commercial* version of the Intel MKL product.
All of these can be obtained at: Intel® Math Kernel Library product web page.
Intel® MKL is also bundled with the following products
2. Intel MPI can be obtained from Intel® Cluster Tools. Open source MPI (MPICH2) can be obtained from http://www.mcs.anl.gov/research/projects/mpich2/*.
Step 3 - Configuration
Use the following commands to extract the HPCC tar files from the downloaded hpcc-x.x.x.tar.gz.
$gunzip hpcc-x.x.x.tar.gz
$tar -xvf hpcc-x.x.x.tar
The above will create a directory named hpcc-x.x.x
Make sure that MPI, C++ and FORTRAN compilers are installed and they are in PATH. Also set LD_LIBRARY_PATH to your compiler (C++ and FORTRAN), MPI, and MKL libraries.
Step 4 - Building HPCC
• Build MPI MKL FFTW library.
Change the directory to <your MKL installation>/interfaces/fftw2x_cdft.
From the fftw2x_cdft directory, run the following command:
$make libintel64 PRECISION=MKL_DOUBLE interface=ilp64
Here we are building for Intel64 architecture with Intel MPI (default for Makefile, you may use a different mpi), with Intel compilers, DOUBLE precision and ilp64 interface. This will create the MKL MPI FFTW interface library libfftw2x_cdft_DOUBLE_ilp64.a in lib/intel64 directory.
Note: Please note that by setting the interface parameter to ilp64 we require to build the FFTW MPI wrappers which admit 64-bit parameters in their interface to match the calls from HPCC. These 64-bit aware wrappers are not to be used with usual applications complying with traditional FFTW interfaces. Please execute $make to see the full set of options.
• Build FFTW C wrapper library
Change the directory to <your MKL installation>/interfaces/fftw2xc.
Then build the FFTWC wrapper by running the command as below
$make libintel64 PRECISION=MKL_DOUBLE
This will create libfftw2xc_intel.a library in <your mkl installation>/lib/intel64 directory
• Build HPCC
Change directory to hpcc-x.x.x/hpl
Create a Makefile from the existing one, for e.g. Make.intel. You can reuse one from the hpl/setup directory.
Edit Make.intel as follows: modify the LAdir, LAlib lines as below to point to MKL libraries.
LAdir = /opt/intel/mkl/lib/intel64
LAlib = -Wl,--start-group $(LAdir)/libfftw2x_cdft_DOUBLE_lp64.a $(LAdir)/libfftw2xc_intel.a $(LAdir)/libmkl_intel_lp64.a $(LAdir)/libmkl_intel_thread.a $(LAdir)/libmkl_core.a $(LAdir)/libmkl_blacs_intelmpi_lp64.a $(LAdir)/libmkl_cdft_core.a -Wl, --end-group -lpthread -lm
Please make sure to following compiler options on the compile line:
-DUSING_FFTW -DMKL_INT=long -DLONG_IS_64BITS
Build HPCC by using
$make all arch=intel
This will create an executable with name hpcc in the hpcc-x.x.x directory and a file _hpccinf.txt which is a template input file for hpcc. Rename the file to hpccinf.txt.
Step 5 - Running HPCC
Modify the configuration parameters in hpccinf.txt file.
Run hpcc by executing the following command.
$mpirun -np 4 hpcc
hpccinf.txt is the same as standard hpl input file with a few additional lines. Please refer our HPL application note on tuning parameters in the configuration file.
Appendix A - Performance Results
Below are the hpcc benchmark results of Intel Endeavor cluster which can also be found in hpcc website*.
HPC Challenge Benchmark Record
|
System Information
|
||||
| Affiliation: |
Intel Corporation |
URL: |
http://www.intel.com/ |
|
| Location: |
USA, Washington, DuPont |
System Use: |
Vendor |
|
| System Manufacturer: |
Intel |
System Name: |
Intel Endeavor cluster |
|
| Interconnect Manufacturer: |
Mellanox |
Interconnect Type: |
QDR Infiniband (40 Mellanox MTS3600Q-1UNC switches, Mellanox MHGH28-XTC adapters on nodes, only one port used per adpater, slot type is PCIe x8 Gen2) | |
| Operating System: |
Red Hat EL 5.4, kernel 2.6.18-164 |
MPI: |
Intel MPI 4.0 |
|
| MPI Wtick: |
0.000001 |
BLAS: |
Intel MKL 10.3 |
|
| Language: |
C |
Compiler: |
Intel C/C++ Compiler 11.1.064 |
|
| Compiler Flags: |
-O2 -xSSE4.2 -ip -ansi-alias -fno-alias -DUSING_FFTW -DMKL_INT=long -DLONG_IS_64BITS -DRA_SANDIA_OPT2 -DHPCC_FFT_235 (and "-opt-streaming-stores always" for stream.c) | Processor Type: |
Xeon X5670 (SMT OFF, Turbo OFF, DDR3-1333) |
|
| Processor Speed: |
2.93 GHz |
Total Processors: |
4320 |
|
| Processors Entered: |
4320 |
Processors determined: |
4320 |
|
| Cores per chip: |
6 |
HPL Processes: |
4320 |
|
| MPI Processes: |
4320 |
Threads Entered: |
1 |
|
| Threads determined: |
1 |
FLOPs per cycle: |
||
| Theoretical peak: |
50.6304 TFlop/s |
Total memory: |
8640 GiB |
|
| FFT library: |
Intel MKL 10.3 |
|||
| HPL |
||||
| HPL: |
43.722 Tflop/s |
HPL time: |
14002.6 |
|
| HPL eps: |
2.22045e-16 |
HPL Rnorm1: |
0.000000140367 |
|
| HPL Anorm1: |
243723 |
HPL AnormI: |
243671 |
|
| HPL Xnorm1: |
1011140 |
HPL XnormI: |
6.36053 |
|
| HPL N: |
972000 |
HPL NB: |
168 |
|
| HPL NProw: |
60 |
HPL NPcol: |
72 |
|
| HPL depth: |
0 |
HPL NBdiv: |
2 |
|
| HPL NBmin: |
4 |
HPL CPfact: |
R |
|
| HPL CRfact: |
C |
HPL CPtop: |
1 |
|
| HPL order: |
R |
|||
| HPL dMach EPS: |
2.220446e-16 |
HPL sMach EPS: |
0.0000001192093 |
|
| HPL dMach sfMin: |
2.2250739999999997e-308 |
HPL sMach sfMin: |
1.1754939999999999e-38 |
|
| HPL dMach Base: |
2 |
HPL sMach Base: |
2 |
|
| HPL dMach Prec: |
4.440892e-16 |
HPL sMach Prec: |
0.0000002384186 |
|
| HPL dMach mLen: |
53 |
HPL sMach mLen: |
24 |
|
| HPL dMach Rnd: |
0 |
HPL sMach Rnd: |
0 |
|
| HPL dMach eMin: |
-1021 |
HPL sMach eMin: |
-125 |
|
| HPL dMach rMin: |
2.2250739999999997e-308 |
HPL sMach rMin: |
1.1754939999999999e-38 |
|
| HPL dMach eMax: |
1025 |
HPL sMach eMax: |
129 |
|
| HPL dMach rMax: |
0 |
HPL sMach rMax: |
0 |
|
| dweps: |
1.110223e-16 |
sweps: |
0.00000005960464 |
|
| PTRANS |
||||
| PTRANS: |
549.988 GB/s |
PTRANS time: |
3.43075 seconds |
|
| PTRANS residual: |
0 |
PTRANS N: |
486000 |
|
| PTRANS NB: |
232 |
PTRANS NProw: |
60 |
|
| PTRANS NPcol: |
72 |
|||
| STREAM |
||||
| S-STREAM Copy: |
8.30307 GB/s |
S-STREAM Scale: |
8.2778 GB/s |
|
| S-STREAM Add: |
11.0563 GB/s |
S-STREAM Triad: |
11.0009 GB/s |
|
| EP-STREAM Copy: |
3.33023 GB/s |
EP-STREAM Scale: |
3.32376 GB/s |
|
| EP-STREAM Add: |
3.48553 GB/s |
EP-STREAM Triad: |
3.5357 GB/s |
|
| STREAM Vector Size: |
72900000 |
STREAM Threads: |
1 |
|
| RandomAccess |
||||
| S-RandomAccess: |
0.035379 Gup/s |
EP-RandomAccess: |
0.0166186 Gup/s |
|
| G-RandomAccess: |
10.8309 Gup/s |
G-RandomAccess N: |
549755813888 |
|
| G-RandomAccess time: |
203.033 seconds |
G-RandomAccess Check Time: |
187.382 seconds |
|
| G-RandomAccess Errors: |
1343419 |
G-RandomAccess Errors Fraction: |
0.00000244366 |
|
| G-RandomAccess TimeBound: |
-1 |
G-RandomAccess ExeUpdates: |
2199023255552 |
|
| RandomAccess N: |
134217728 |
|||
| FFT |
||||
| S-FFT: |
2.3047 GFlop/s |
EP-FFT: |
1.14392 GFlop/s |
|
| MPIFFT: |
1173.89 GFlop/s |
MPIFFT N: |
116640000000 |
|
| MPIFFT Max Error: |
0.00000000000000431742 |
MPIFFT time0: |
0 seconds |
|
| MPIFFT time1: |
0 seconds |
MPIFFT time2: |
0 seconds |
|
| MPIFFT time3: |
0 seconds |
MPIFFT time4: |
0 seconds |
|
| MPIFFT time5: |
0 seconds |
MPIFFT time6: |
0 seconds |
|
| FFTEnblk: |
16 |
FFTEnp: |
8 |
|
| FFTEl2size: |
1048576 |
|||
| DGEMM |
||||
| S-DGEMM: |
11.0582 GFlop/s |
EP-DGEMM: |
10.9366 GFlop/s |
|
| DGEMM N: |
8537 |
|||
| RandomRing Latency/Bandwidth |
||||
| RandomRing Latency: |
6.43059 usec |
RandomRing Bandwidth: |
0.131166 GB/s |
|
| NaturalRing Latency/Bandwidth |
||||
| NaturalRing Latency: |
3.44515 usec |
NaturalRing Bandwidth: |
0.962355 GB/s |
|
| PingPong Latency/Bandwidth |
||||
| Maximum PingPong Latency: |
4.36604 usec |
Maximum PingPong Bandwidth: |
4.02814 GB/s |
|
| Minimum PingPong Latency: |
0.238419 usec |
Minimum PingPong Bandwidth: |
1.48091 GB/s |
|
| Average PingPong Latency: |
3.62335 usec |
Average PingPong Bandwidth: |
1.80222 GB/s |
|
| Size of Data Types |
||||
| char: |
1 byte |
short: |
2 bytes |
|
| int: |
4 bytes |
long: |
8 bytes |
|
| void ptr: |
8 bytes |
float: |
4 bytes |
|
| double: |
8 bytes |
size t: |
8 bytes |
|
| s64Int: |
8 bytes |
u64Int: |
8 bytes |
|
| OpenMP |
||||
| M OpenMP: |
-1 |
OpenMP Num Threads: |
0 |
|
| OpenMP Num Procs: |
0 |
OpenMP Max Threads: |
0 |
|
| Memory |
||||
| MemProc: |
-1 |
MemSpec: |
-1 |
|
| MemVal: |
-1 |
|||
| CPS |
||||
| CPS_HPCC_FFT_235: |
1 |
CPS_HPCC_FFTW_ESTIMATE: |
0 |
|
| CPS_HPCC_MEMALLCTR: |
0 |
CPS_HPL_USE_GETPROCESSTIMES: |
0 |
|
| CPS_RA_SANDIA_NOPT: |
0 |
CPS_RA_SANDIA_OPT2: |
1 |
|
Version: 1.4.1.b - Run Type: base
Created: 2010-11-01 - Exported: Thu Mar 17 06:32:04 2011
Appendix C - References
• Intel Xeon Processor based Servers Homepage
Do you need more help?
Article Attachments
This article applies to: Intel® C++ Compiler for Linux* Knowledge Base, Intel® C++ Compiler for Mac OS X* Knowledge Base, Intel® C++ Compiler for Windows* Knowledge Base, Intel® Cluster Toolkit for Linux* Knowledge Base, Intel® Cluster Toolkit for Windows* Knowledge Base, Intel® Fortran Compiler for Linux* Knowledge Base, Intel® Fortran Compiler for Mac OS X* Knowledge Base, Intel® Math Kernel Library Knowledge Base, Software Products General
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (12) 
| December 27, 2009 9:38 PM PST
Vipin Kumar E K (Intel)
|
We have modified this KB to work around for a bug, the memory is allocated with malloc() and de-allocation with fftw_free() in HPCC 1.3.1 and provided the patch to fix the same. Please download and apply the patch and follow the steps in "Build FFTW C wrapper library". --Vipin |
| December 29, 2009 4:51 AM PST
patirot
| Thank you very much! |
| February 3, 2010 1:37 PM PST
Andres More (Intel)
|
As a side note, the performance figures of STREAM can be increased in Intel architecture by explicitly using the '-opt-streaming-stores always' option of the Intel C Compiler when compiling the STREAM source. |
| September 30, 2010 1:38 AM PDT
xuzheng97
|
Hello, I tested as guided including "Build FFTW C wrapper library" part. But I still got error as following: *** glibc detected *** ./hpcc: free(): invalid pointer: 0x0000003ce8352d48 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3ce80722ef] /lib64/libc.so.6(cfree+0x4b)[0x3ce807273b] ./hpcc[0x4cbb0a] I am using Intel compiler 11.1.072 and corresponding MKL in the compiler directory. Also MPI is 4.0.0.028. I tried both HPCC1.4.1 and 1.3.1 with same error Thanks |
| January 19, 2011 5:08 AM PST
Anna Labutina |
I tried to build HPCC with MKL and msmpi on Windows. I'm using Intel Composer 2011.1.127 and MKL 10.3.1.127 in VS2010. I successfully built fftw2x_cdft_DOUBLE_ilp64.lib and fftw2xc_intel.lib for ilp64 interface. I link HPCC with msmpi.lib fftw2x_cdft_DOUBLE_ilp64.lib fftw2xc_intel.lib mkl_blacs_msmpi_ilp64.lib mkl_cdft_core.lib mkl_core.lib mkl_intel_ilp64.lib mkl_intel_thread.lib When I use just two compiler flags /DUSING_FFTW /DMKL_ILP64, everything seems to work fine (the only thing that annoys me that StarFFT gets very memory-consuming). I cannot use /DMKL_INT=long /DLONG_IS_64BITS flags, when I build with these flags the code wouldn't start and the output message I get from HPCC is "No 64-bit integer type available". How do I fix it? And will the solution help me with memory consuming issues? |
| April 18, 2011 5:54 PM PDT
roger golliver |
you build fftw_cdft ilp64, but in the link section you pull in lp64 (not ilp64) for everything. Could you update the instructions for the 12.0 compilers, intel mpi 4.0.1.007. could you also include the LAinc parameter and CCFLAGS, etc. Maybe you could post your hpl/Make.intel file. Thanks, Roger |
| June 7, 2011 6:34 PM PDT
Robert Hernan |
Would you have the kindness to send me the parameters file used for the test reported in this paga, please? Thanks in advance |
| August 24, 2011 4:21 AM PDT
Vipin Kumar E K (Intel)
|
Hi Roger, The Make.intel is attached in the article attachments section above. The LAinc and CCLAGS are LAdir = $(MKL) LAinc = -I$(MKL)/include CCFLAGS = $(HPL_DEFS) -DASYOUGO -O2 -xSSE4.2 -ip -ansi-alias -fno-alias -DUSING_FFTW -DMKL_INT=long -L$MKL/lib/intel64 -I$MKL/include/fftw -L$MPI/lib64 -DLONG_IS_64BITS -DRA_SANDIA_OPT2 -DHPCC_FFT_235 We are in the process of updating this article to the latest version of Intel MKL, Compilers and MPI. Thanks, Vipin |
| August 24, 2011 4:33 AM PDT
Vipin Kumar E K (Intel)
|
Hi Anna, May be this article can be of some help. http://software.intel.com/en-us/articles/64-bit-int-support-on-win64-mkl/ --Vipin |
| August 24, 2011 4:36 AM PDT
Vipin Kumar E K (Intel)
|
Roger, The article has been already updated with the a note in the Step 4 - Building HPCC • Build MPI MKL FFTW library. as Note: Please note that by setting the interface parameter to ilp64 we require to build the FFTW MPI wrappers which admit 64-bit parameters in their interface to match the calls from HPCC. These 64-bit aware wrappers are not to be used with usual applications complying with traditional FFTW interfaces. Please execute $make to see the full set of options. --Vipin |
| January 6, 2012 3:00 AM PST
hilgeman
|
Vipin, When I use MKL 10.3.7.256, StarFFT aborts with the following message: fftw_die: DftiCompute returned error in fftwnd() MKL 10.3.0.084 is fine, though. I am using the 11.1.073 compilers. regards, -Martin |
Trackbacks (0)
Leave a comment 
To obtain technical support, please go to Software Support.
Author
Vipin Kumar E K (Intel)
|



patirot
10
------------my LAlib------------------
LAlib = -Wl,--start-group $(LAdir)/libmkl_intel_lp64.a $(LAdir)/libmkl_sequential.a $(LAdir)/libmkl_core.a $(LAdir)/libmkl_bl
acs_intelmpi_lp64.a $(LAdir)/libfftw2x_cdft_DOUBLE_ilp64.a $(LAdir)/libfftw2xc_intel.a $(LAdir)/libmkl_cdft_core.a -Wl,--end-group $(LAdi
r)/libiomp5.a -lpthread -lm
------------------------------
HPL_OPTS = -DUSING_FFTW -DMKL_ILP64 -DLONG_IS_64BITS
------------------------------
CCFLAGS = $(HPL_DEFS) $(MKLINCDIR) -O2 -xSSE4.2 -ansi-alias -ip
------------------------------
*** glibc detected *** ./hpcc: double free or corruption (out): 0x00002b81f80007d0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x387e271ce2]
/lib64/libc.so.6(cfree+0x8c)[0x387e27590c]
*** glibc detected *** ./hpcc: double free or corruption (out): 0x00002b409c000700 ***
======= Backtrace: =========
/lib64/libc.so.6[0x387e271ce2]
./hpcc[0x4bfdba]
./hpcc[0x46d636]
./hpcc[0x46d036]
./hpcc[0x46d698]
./hpcc[0x46d04d]
/lib64/libc.so.6(cfree+0x8c)[0x387e27590c]
./hpcc[0x4bfdba]
./hpcc[0x46d636]
*** glibc detected *** ./hpcc: double free or corruption (out): 0x00002b95f8000790 ***