GROMACS recipe for symmetric Intel® MPI using PME workloads


This package (scripts with instructions) delivers a build and run environment for symmetric Intel® MPI runs. This file is actually the README of the package. Symmetric stands for employing a Xeon® executable and a Xeon Phi™ executable both running together exchanging MPI messages and collective data via Intel MPI.

  • Entwickler
  • Partner
  • Studenten
  • Linux*
  • Server
  • C/C++
  • Fortgeschrittene
  • Intel® Parallel Studio XE Cluster Edition
  • symmetric MPI
  • native MPI
  • cmake
  • heterogeneous clusters
  • Intel® Many Integrated Core (Intel® MIC) Architecture
  • Message Passing Interface
  • OpenMP*
  • Akademischer Bereich
  • Cluster-Computing
  • Intel® Core™ Prozessoren
  • Intel® Many Integrated Core Architektur
  • Optimierung
  • Parallel Computing
  • Portierung
  • Threading
  • OpenMP Shared Arrays

    I have two questions about WRITE/READ operations on shared arrays.
     1) In my program I write a different element of a given shared array at every iteration of an OpenMP-parallelized DO LOOP. The results that I get should be right but I'm just wondering whether this is fine or I should enclose the READ/WRITE section in a CRITICAL block. Then, I also READ elements from a shared array without modifying them and it seems to work. Are these procedures correct?

    [Bug] OSX Yosemite 10.10 fails when compiling

    # ProductName:    Mac OS X
    # ProductVersion:    10.10.3
    # BuildVersion:    14D136

    curl -O
    gunzip -c libomp_20150401_oss.tgz | tar xopf -
    cd libomp_oss

    in line 124..126 of libomp_oss/src/
    ifeq "$(os)" "mac"
        mac_os_new := $(shell /bin/sh -c 'if ; then echo "1"; else echo "0"; fi')

    Elusive Algorithms - Parallel Scan

    Last month there was a query on the IDZ MIC forum "how to perform inclusive scan in C cilk" in which my initial reply was:

    Parallelizing this is problematic due to the next result being dependent upon the prior result. While this is not impossible, it is rather difficult and it introduces some redundant additions.

    Inconsistent Speedup


    I'm new in using OpenMP. I would like to ask about speedup ratio.

    I running C source code with OpenMP added with Intel core i5-2410M.

    Based on my understanding, speedup = execution time of code using one thread/execution time of code using N threads 

    The execution time recorded is time_diff in the attached code.

    Basic OMP Parallelized Program Not Scaling As Expected

    #include <iostream>
    #include <vector>
    #include <stdexcept>
    #include <sstream>
    #include <omp.h>
    std::vector<int> col_sums(std::vector<std::vector<short>>& data) {
        unsigned int height = data.size(), width = data[0].size();
        std::vector<int> totalSums(width, 0), threadSums(width, 0);
        #pragma omp parallel firstprivate(threadSums)
            #pragma omp parallel for
            for (unsigned int i = 0; i < height; i++) {
      [0:width] += data[i].data()[0:width];

    Promlems with Intel MPI

    I have trouble with running Intel MPI on cluster with different different numbers of processors on nodes (12 and 32).

    I use Intel MPI 4.0.3 and it works correctly on 20 nodes with 12 processors (Intel(Xeon(R)CPU X5650 @2.67)) at each, and all processors works correctly, then I try to run Intel MPI on other 3 nodes with 32 processors (Intel(Xeon(R)CPU E5-4620 v2@2.00) at each and they work correctly too.

    OpenMP* abonnieren