| April 19, 2010 11:30 AM PDT | |
HPL User Note
Step 1 - Overview
This guide is intended to help current HPL users get better benchmark performance by utilizing BLAS from the Intel® Math Kernel Library (Intel® MKL).
HPL (High Performance LINPACK), an industry standard benchmark for HPC, is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers.
We will be explaining 3 ways in this note to get the HPL running.
1. Using Intel® optimized HPL binary directly (mp_inpack)
2. Building and using HPL from source provided in MKL package
3. Building and using open source HPL by linking with MKL
Version Information
This application note was created to help users who benchmark clusters using HPL to make use of the latest versions of Intel® MKL on Linux platforms. Specifically we'll address Intel® MKL version 10.3 update 2.
Step 2 - Downloading HPL Source Code
Download Open source HPL.
If you have installed MKL, HPL is included in MKL and can be found at
<MKL installation dir>/benchmarks/mp_linpack
Prerequisites
1. BLAS
BLAS (Basic Linear Algebra Subprograms) DGEMM is the core high performance routine exercised by HPL. Intel® MKL BLAS is highly optimized for maximum performance on Intel® Xeon® processor-based systems.
BLAS from MKL can be obtained from the following options
• A. Download a FREE evaluation version of the Intel MKL product
• B. Download the FREE non-commercial* version of the Intel MKL product.
Intel® MKL is also bundled with the following products
• Intel® Parallel Studio XE 2011
• Intel® Composer XE 2011
• Intel® Cluster Studio 2011
FREE Intel Optimized LINPACK Benchmark packages
The Intel MKL team provides FREE Intel® Optimized LINPACK Benchmark packages that are binary implementations of the LINPACK benchmarks which include Intel® MKL BLAS. Not only are these SMP and Distributed Memory packages free, they are also much easier to use than HPL (no compilation needed, just run the binaries). We highly recommend HPL users consider switching from HPL to the Free Intel Optimized LINPACK benchmark packages.
2. MPI
Download Intel® MPI or Open source MPI (MPICH2).
You may choose to run the pre-built binaries from the FREE Intel® Optimized LINPACK Benchmark packages or build HPL from the following steps and run. The hybrid (mpi + openmp) parallel versions of HPL binaries are also included in the package.
If you are building HPL source that is available as part of Intel® MKL, please skip the Steps 3 & 4 mentioned below. The two makefiles, Make.ia32 and Make.intel64 are provided for Ia32 and Intel64 platforms. The makefiles are given in such a way that, you can build either serial or hybrid version of HPL.
If you downloaded hpl-2.0.tar.gz (from netlib) please follow below instructions.
Step 3 - Configuration
1) Extract the tar file
Use the following commands to extract the tar file from the downloaded hpl-2.0.tar.gz file
$gunzip hpl-2.0.tar.gz $tar -xvf hpl-2.0.tar.
This will create an hpl directory, which we call below the top-level directory.
2) Makefile Creation
Create a file Make.<arch> in the top-level directory. For this purpose, you may want to re-use one contained in the setup directory (hpl\setup\). Let us use Make.Linux_PII_CBLAS. This file essentially contains the compilers and libraries with their paths to be used.
Copy this file.
$cp hpl-2.0\setup\Make.Linux_PII_CBLAS hpl
Rename this file
$mv Make.Linux_PII_CBLAS Make.intel64
This user note explains how to build HPL for Intel64 platform.
Make sure that Intel® C++ and FORTRAN compilers are installed and they are in PATH, also set LD_LIBRARY_PATH to your compiler (C++ and FORTRAN), MPI, and MKL libraries.
Step 4 - Modifying Makefile
The steps below will explain the steps for building HPL
Edit Make.intel64
1) Change value of ARCH to intel64 (Whichever the value, you have given for <arch>)
# ---------------------------------------------------------------------- # - Platform identifier ------------------------------------------------ # ---------------------------------------------------------------------- # ARCH = intel64
2) Point to your MPI library
MPdir = /opt/intel/mpi MPinc = -I$(MPdir)/include64 MPlib = $(MPdir)/lib64/libmpi_mt.a
Here, we selected multi-threaded version of MPI library.
If you are using gnu MPI (MPICH2), it would be libmpich.a instead of libmpi.a
It is advisable to use Intel® MPI for better performance.
3) Point to the math library, MKL
LAdir = /opt/intel/mk/lib/intel64 LAinc = /opt/intel/mkl/include LAlib = -Wl,--start-group $(LAdir)/libmkl_intel_lp64.a $(LAdir)/libmkl_intel_thread.a $(LAdir)/libmkl_core.a -Wl, --end-group -lpthread -lm
4) Modify Compiler flag to include OpenMP
Add the following to CCFLAGS
-openmp
Step 5 - Building HPL
To build the executable use "make arch=<arch>". This should create an executable in the bin/<arch> directory called xhpl.
In our example, execute
$make arch=intel64
This creates the executable file bin/intel64/xhpl. It also creates a HPL configuration file HPL.dat.
Typically, scripts are needed to be run, and perhaps portions of the readme file should be reprinted.
Also list the compiler command line syntax, etc.
Step 6 - Running HPL
Case 1: If you have downloaded Intel® Optimized linpack
Extract the package and run the script for your platform
For e.g: Running hybrid HPL on Intel64 Xeon machines.
$runme_hybrid_inte64
Please refer the lpk_notes_lin.htm provided with this package for more details.
Case 2 & 3: If you have built the hpl from the mkl package or open source hpl
Go to the directory where the executable is built.
e.g: For the test run of hpl, use the following commands.
$cd bin/<arch>
$mpirun -np 4 xhpl
Create a machines file with node names.
front-end-0 compute-0 compute-1 ..................... ..................... compute-128
For e.g. machines files contains names as
Running with the machines file.
$mpirun -np 8 -nodes 4 -machinefile machines xhpl
Please refer MPI documentation for various other arguments, which you can use.
Tuning:
Most of the performance parameters can be tuned, by modifying the input file bin/HPL.dat. See the file TUNING in the top-level directory for more information.
Note: If you use Intel® Optimized linpack, you have to change the input files provided with that package, for e.g: HPL_hybrid.dat. You can refer the extended help xhelp.lpk for more info in modifying the input file.
Main parameters you need to consider while running HPL.
Problem size (N): Your problem size should be the largest to fit in the memory to get best performance. For e.g.: If you have 10 nodes with 1 GB RAM, total memory is 10GB. i.e. nearly 1342 M double precision elements. Square root of that number is 36635. You need to leave some memory for Operating System and other things. As a rule of thumb, 80% of the total memory will be a starting point for problem size (So, in this case, say, 33000). If the problem size is too large, it is swapped out, and the performance will degrade.
Block Size (NB): HPL uses the block size NB for the data distribution as well as for the computational granularity. A very small NB will limit computational performance because no data reuse will occur, and also the number of messages will also increase. "Good" block sizes are almost always in the [32 .. 256] interval and it depends on Cache size. These block size are found to be good, 80-216 for IA32; 128-192 for IA64 3M cache; 400 for 4M cache for IA64 and 130 for Woodcrests.
Process Grid Ratio (PXQ): This depends on physical interconnection network. P and Q should be approximately equal, with Q slightly larger than P. For e.g. for a 480 processor cluster, 20X24 will be a good ratio.
Tips: You can also try changing the node-order in the machine file for check the performance improvement. Choose all the above parameters by trial and error to get the best performance.
You can also use a simple PHP web tool to enter you system specs and it will suggest for you optimal input parameters for your HPL file before running the benchmark on the cluster. The tool can be accessed via the URL below under sourceforge:
http://hpl‐calculator.sourceforge.net
Appendix A - Performance comparison
The following is the serial version HPL run performance results for Intel® Xeon processor based on 5670 series (Westmere) systems.
Hardware Configuration
Processor: Intel® Xeon X5670 (Westmere) , 2.93 GHz / 6.4 QPI 1333 95 W, 32KB L1/256KB L2/12MB L3 cache
RAM: 24 GB total/node ( 6*4GB 1333MHz )
Number of processors: 4320
Cores per chip: 6
Theoretical Peak: 50.6304 TFlops/s
Total Memory: 8640 GB
Software Configuration
Intel® Compiler 11.1
Intel® MPI 4.0
Intel® MKL10.3
|
HPL: |
43.722 Tflop/s |
|
HPL time: |
14002.6 |
|
HPL eps: |
2.22045e-16 |
|
HPL Rnorm1: |
0.000000140367 |
|
HPL Anorm1: |
243723 |
|
HPL AnormI: |
243671 |
|
HPL Xnorm1: |
1011140 |
|
HPL XnormI: |
6.36053 |
|
HPL N: |
972000 |
|
HPL NB: |
168 |
|
HPL NProw: |
60 |
|
HPL NPcol: |
72 |
|
HPL depth: |
0 |
|
HPL NBdiv: |
2 |
|
HPL NBmin: |
4 |
|
HPL CPfact: |
R |
|
HPL CRfact: |
C |
|
HPL CPtop: |
1 |
|
HPL order: |
R |
|||
|
HPL dMach EPS: |
2.220446e-16 |
|
HPL sMach EPS: |
0.0000001192093 |
|
HPL dMach sfMin: |
2.2250739999999997e-308 |
|
HPL sMach sfMin: |
1.1754939999999999e-38 |
|
HPL dMach Base: |
2 |
|
HPL sMach Base: |
2 |
|
HPL dMach Prec: |
4.440892e-16 |
|
HPL sMach Prec: |
0.0000002384186 |
|
HPL dMach mLen: |
53 |
|
HPL sMach mLen: |
24 |
|
HPL dMach Rnd: |
0 |
|
HPL sMach Rnd: |
0 |
|
HPL dMach eMin: |
-1021 |
|
HPL sMach eMin: |
-125 |
|
HPL dMach rMin: |
2.2250739999999997e-308 |
|
HPL sMach rMin: |
1.1754939999999999e-38 |
|
HPL dMach eMax: |
1025 |
|
HPL sMach eMax: |
129 |
|
HPL dMach rMax: |
0 |
|
HPL sMach rMax: |
0 |
|
dweps: |
1.110223e-16 |
|
sweps: |
0.00000005960464 |
Appendix B - Known Issues and Limitations
If you are building haply rather than using the binary from Intel® Optimized linpack, make sure that, your MPI is running properly, Fortran, C++, MPI and MKL libraries are in LD_LIBRARY_PATH and Fortran, C++ and MPI binaries are in PATH.
Appendix C - References
High Performance Computing Software and Applications
Intel Xeon Processor- and Itanium 2-based Servers Homepage
This article applies to: Intel® C++ Compiler for Linux* Knowledge Base, Intel® C++ Compiler for Mac OS X* Knowledge Base, Intel® C++ Compiler for Windows* Knowledge Base, Intel® Cluster Toolkit for Linux* Knowledge Base, Intel® Cluster Toolkit for Windows* Knowledge Base, Intel® Fortran Compiler for Linux* Knowledge Base, Intel® Fortran Compiler for Mac OS X* Knowledge Base, Intel® Math Kernel Library Knowledge Base, Intel® MPI Library for Linux* Knowledge Base, Intel® MPI Library for Windows* Knowledge Base, Intel® Visual Fortran Compiler for Windows* Knowledge Base
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (7) 
| June 11, 2009 12:39 AM PDT
Vipin Kumar E K (Intel)
|
We have not seen such error earlier. Looks like, it is some error related to the way you built HPL. --Vipin |
| January 11, 2010 10:22 PM PST
Warner Yuen
|
Is it possible to build HPL with MKL running on Mac OS X v10.6? I have both ICC v11.1 and the latest Mac OS X MKL. I haven't gotten an HPL that will run. |
| April 28, 2010 4:44 AM PDT
Vipin Kumar E K (Intel)
| Yes, you should be able able to build HPL in that configuration. You may also check the pre-built binaries with latest Intel & MKL and Intel compiler from http://software.intel.com/en-us/articles/intel-math-kernel-l.....-download/. |
| November 10, 2010 12:15 PM PST
carlostacc.utexas.edu
|
Hi, could you share with us the NB you used for the runs in X5670 chips? I have seen good results with NB around 192, but I would like to know what you think is best since you have tested in larger systems than I have (10 x 2 socket is my largest). Thanks! |
| July 6, 2011 4:41 PM PDT
drMikeT
|
Hi Vipin, could you provide us with the settings you used to achieve this performance ? thanks --Michael |
| October 6, 2011 2:12 PM PDT
Evaldo |
The file Make.intel64, when try to run, have the error The file: # SHELL = /bin/sh # CD = cd CP = cp LN_S = ln -s MKDIR = mkdir RM = /bin/rm -f TOUCH = touch # ARCH = intel64 # TOPdir = /home/intel/hpl-2.0 INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a # MPdir = /home/intel/impi/4.0.1.007 MPinc = -I$(MPdir)/include64 MPlib = $(MPdir)/lib64/libmpi_mt.a # LAdir = /home/intel/mkl/lib/intel64 LAinc = /home/intel/mkl/include LAlib = $(LAdir)/libmkl_intel_lp64.a $(LAdir)/libmkl_intel_thread.a $(LAdir)/libmkl_core.a -lpthread -l m # F2CDEFS = # HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) # HPL_OPTS = -DHPL_CALL_CBLAS # HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) # CC = /home/intel/bin/icc CCNOOPT = $(HPL_DEFS) CCFLAGS = $(HPL_DEFS) # LINKER = /home/intel/bin/ifort LINKFLAGS = $(CCFLAGS) # ARCHIVER = ar ARFLAGS = r RANLIB = echo The error: make[2]: Leaving directory `/home/intel/hpl-2.0/testing/ptimer/intel64' ( cd testing/ptest/intel64; make ) make[2]: Entering directory `/home/intel/hpl-2.0/testing/ptest/intel64' /home/intel/bin/icc -o HPL_pddriver.o -c -DHPL_CALL_CBLAS -I/home/intel/hpl-2.0/include -I/home/intel/hpl-2.0/include/intel64 /home/intel/mkl/include -I/home/intel/impi/4.0.1.007/include64 ../HPL_pddriver.c icc: warning #10147: no action performed for specified file(s) /home/intel/bin/icc -o HPL_pdinfo.o -c -DHPL_CALL_CBLAS -I/home/intel/hpl-2.0/include -I/home/intel/hpl-2.0/include/intel64 /home/intel/mkl/include -I/home/intel/impi/4.0.1.007/include64 ../HPL_pdinfo.c icc: warning #10147: no action performed for specified file(s) /home/intel/bin/icc -o HPL_pdtest.o -c -DHPL_CALL_CBLAS -I/home/intel/hpl-2.0/include -I/home/intel/hpl-2.0/include/intel64 /home/intel/mkl/include -I/home/intel/impi/4.0.1.007/include64 ../HPL_pdtest.c icc: warning #10147: no action performed for specified file(s) /home/intel/bin/ifort -DHPL_CALL_CBLAS -I/home/intel/hpl-2.0/include -I/home/intel/hpl-2.0/include/intel64 /home/intel/mkl/include -I/home/intel/impi/4.0.1.007/include64 -o /home/intel/hpl-2.0/bin/intel64/xhpl HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o /home/intel/hpl-2.0/lib/intel64/libhpl.a /home/intel/mkl/lib/intel64/libmkl_intel_lp64.a /home/intel/mkl/lib/intel64/libmkl_intel_thread.a /home/intel/mkl/lib/intel64/libmkl_core.a -lpthread -lm /home/intel/impi/4.0.1.007/lib64/libmpi_mt.a ipo: warning #11010: file format not recognized for /home/intel/mkl/include /home/intel/mkl/include: file not recognized: Is a directory make[2]: *** [dexe.grd] Error 1 make[2]: Leaving directory `/home/intel/hpl-2.0/testing/ptest/intel64' make[1]: *** [build_tst] Error 2 make[1]: Leaving directory `/home/intel/hpl-2.0' make: *** [build] Error 2 |
Trackbacks (2)
-
Twitter Trackbacks for
HPL application note - Intel® Software Network
[intel.com]
on Topsy.com
April 29, 2010 9:53 AM PDT - How We Used Intel® MPI Library to Get Outstanding LINPACK Results on a Very Large System – Intel Software Network Blogs - Intel® Software Network
June 8, 2011 1:48 PM PDT




ananthnarayan_s
10
MKL ERROR: Parameter 10 was incorrect on entry to cblas_dtrsm
followed by SEGFAULT messages. Has this been encountered earlier?