The Scalable Heterogeneous Computing Benchmark Suite (SHOC) for Intel® Xeon Phi™

The Scalable Heterogeneous Computing Benchmark Suite (SHOC  may be used for measuring performance and stability of Coprocessor based systems. The benchmark has been ported to support Intel® Xeon Phi™ using offload programming constructs implemented in the Intel® Compiler that is available as part of Intel® Composer XE 2013 package.

You can get more information about the benchmark from

Benchmark Download

The Intel Xeon Phi-specific modifications to the benchmark can be downloaded from the git repository


The SHOC benchmark for Intel Xeon Phi consists of following components: All of these benchmarks, where relevant, reports performance numbers with and without the data transfer overhead.

Level 0 Benchmarks: Measures 'feeds and speeds' of the coprocessor hardware 

  1. BusSpeedDownload and BusSpeedReadback:  These benchmarks measure the data transfer speed between host and Intel Xeon Phi coprocessor for various data sizes.
  2. DeviceMemory: This measures read/write speed to GDDR3 memory from the coprocessor core.
  3. MaxFlops: Measures the maximum floating point computation rate for double precision and single precision arithmetic. Note: In this version there are some errors due to compiler optimizing out part of the code thus showing incorrect results. However some of the results are reliable like MADD8-SP and MADD8-DP numbers to get a feel of pure computational speed..

Level 1 Benchmarks: Measures device performance for low level compute tasks.

    1. GEMM: Measures performance of general matrix matrix multiply operation on single precision and double precision numbers using Intel® Math Kernel Library on Intel Xeon Phi.
    2. FFT: Measures performance of
    1. MD:  Measures the performance of  Lennard-Jones potential computations used in  Molecular Dynamics.
    2. Reduction: Measures performance of sum reduction operation of floating point numbers.
    3. Scan: Measure performance of parallel prefix sum of floating point numbers.

Level 2 Benchmark:  Measures performance of real application kernels

  1. S3D: S3D application is used to simulate combustion process. This benchmark measures the performance of 'getrates' kernel that computes the rate of chemical reactions across a regular 3D grid.


For more complete information about compiler optimizations, see our Optimization Notice.

1 comment

dl h.'s picture

i have made a test about the BusSpeedReadback.

when the data is not small,it's performance is 6.xxGB/s.

Bus when the data reach to 1GB,the performance is not steady.

Sometimes it's 6.xxGB/s,sometimes 0.3xxGB/s.

Why the busspeedreadback,and busspeeddownload are not steady.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.