- PandQ- the number of rows and columns in the process grid, respectively.P*Qmust be the number of MPI processes that HPL is using.ChooseP≤Q.
- NB- the block size of the data distribution.The table below shows recommended values ofNBfor different Intel® processors:ProcessorNBIntel® Xeon® Processor X56*/E56*/E7-*/E7*/X7* (codenamed Nehalem or Westmere)256Intel Xeon Processor E26*/E26* v2 (codenamed Sandy Bridge or Ivy Bridge)256Intel Xeon Processor E26* v3/E26* v4 (codenamed Haswell or Broadwell)192Intel® Core™ i3/i5/i7-6* Processor (codenamed Skylake Client)192Intel® Xeon Phi™ Processor 72* (codenamed Knights Landing)336Intel Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions (codenamed Skylake Server)384
- N- the problem size:
IncreasingNusually increases performance, but the size ofNis bounded by memory. In general, you can compute the memory required to store the matrix (which does not count internal buffers) as 8*N*N/(P*Q) bytes, whereNis the problem size andPandQare the process grids inHPL.dat. A general rule of thumb is to choose a problem size that fills 80% of memory.
- For homogeneous runs, chooseNdivisible byNB*LCM(P,Q), whereLCMis the least common multiple of the two numbers.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804