HowTo – HPL Over Intel MPI

Download the PDF: HPL using Intel MPI

This is a step by step procedure of how to run the High Performance Linpack (HPL)benchmark on a Linux cluster using Intel-MPI. This was done on a Linux cluster of 128 nodes running Intel’s Nehalem processor 2.93 MHz with 12GB of RAM on each node. The operating system that was used is RedHat Enterprise Linux 5.3. The interconnectivity between the nodes was via Infiniband 4x-DDR using the standard RedHat EL 5.3 drivers.

You can also use a simple PHP web tool to enter you system specs and it will suggest for you optimal input parameters for your HPL file before running the benchmark on the cluster.

Download the HPL tool from SourceForge:

For more complete information about compiler optimizations, see our Optimization Notice.


drMikeT's picture

In my smaller scale experiments I managed to obtain: 128 total cores used, 1.258e+03 of hpl-2.00 GFlops, that is, %87.75 of theoretical peak (9.8281GFlops / core attained over 11.2 GFlops/core theoretical)

Experiment: 16 nodes x 8 Nehalem cores/node, at 2.8GHz with 22GiBs usable DRAM; CentOS 5.4; QDR IB

HPL-2.00 : N=199864, Nb=172, PxQ=4x4

export I_MPI_FABRICS=shm:dapl
export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1

I have to add I used MKL with OMP_NUM_THREADS=8 per MPI task so the layout was 1 MPI rank / node and 8 OMP thraeds / node;

did try 2, 4, 8 ranks / node with corresponding less OMP threads/rank and results were that not much worse

Intel Compilers 11.1.073 and IntelMPI 4.0.0 (028) with newer versions maybe better perfromance ... :)

jnzhoun's picture

can download this white paper...

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.