Cluster Computing

Poor performance of MPI even with Infiniband


I am benchmarking how much network bandwidth MPI can exploit for my use cases.

I tested the bandwidth between two machines using TCP socket communication and confirmed that they could send/receive 2~3GB of data per sec.

However, I cannot see those numbers from the MPI benchmark results (Intel MPI Benchmark, MPIVCH Benchmark ...) and also in my application.

In my use case, each machine has one MPI process dedicated for the network communication and performs only inter-machine communication.

How to run Intel mp_linpack pre-compiled with Hyper-threading enabled?

Hello all!

I have managed to successfully run mp_linpack on my cluster with hyperthreading disabled.  All cores were running at 100%.

I want to experiment and run with hyperthreading enabled.  The runme_intel64 seems to run the threads only on physical cores.

I was looking through the Intel MPI reference guide and I tried a few parameters, but they didn't help.  Anyone have any idea?

I have tried to put these lines in runme_intel64:

export I_MPI_PIN=on

export I_MPI_PIN_CELL=unit

export I_MPI_PIN_DOMAIN=auto

Shipping products with intel Open MPI


One of our developers has asked me a question:

" We have an application that contains a component that uses the Intel MPI library.  In terms of installing our product, we would also like to install the Intel MPI run-time components so that our customers do not have to do this manually.  Are we able to do this by simply packaging the Intel run-time installer within our installer?




I am going to run a CFD simulation that the memory usage will be over 100 GB.
I use mkl PARDISO in order to solve the linear system that arises.

Very recently I informed about Intel Phi coprocessor and their capabilities that has.
In order to accelerate the solving, I was wondering if PARDISO incorporates Intel Phi technology internally in order
not to modify my code.

Thank you in advance.

Intel® VTune™ Amplifier XE 2016 Update 2 Fixes List

NOTE: Defects and feature requests described below represent specific issues with specific test cases. It is difficult to succinctly describe an issue and how it impacted the specific test case. Some of the issues listed may impact multiple architectures, operating systems, and/or languages. If you have any questions about the issues discussed in this report, please post on the user forums or submit an issue to Intel® Premier Support.

  • Developers
  • Android*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • C/C++
  • Fortran
  • Intel® VTune™ Amplifier
  • Intel VTune Amplifier
  • Cluster Computing
  • Mulit-process service tool for xeon-phi

                  I was wondering do we have any resource management tool for xeon-phi like CUDA MPS (it allows CUDA kernels to be processed concurrently on the same GPU; this can benefit performance when the GPU compute capacity is underutilized by a single application process. )? This tool improves the utilization of GPU so is there any Mulit-process service tool for xeon-phi? 

    Strange Errors by reseting mic configuration

    Hi all,


    I have installed mpss 3.3.3 on my Centos 7.2 machine. After rebuilding the kernel-modules, I could install it without problems.

    I had another mpss configuration on the machine which caused problems (in terms of, no functioning). So I removed those packages and reinstalled.

    When I want to reset the config / init the default config, I get those errors. 

    micinfo displays my Xeon Phi and I can access it via ssh.

    I just wanted to know it this Errors are somewhat take influence on my system. And how to fix them


    MPI having bad performance in user mode, runs perfectly in root

    Hi everyone,

    This is very probably an installation issue of the Parallel Studio 2016 (update 1) on my system...So here are the details:

    I have installed Intel Parallel Studio 2016 (update 1) on my server: two sockets with Intel Xeon processors (Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz), running Ubuntu 12.04 (I know that this is not supposed to be a supported least this is what the requirements checking thingy in the Parallel Studio install says...).

    The problem is really simple:

    Subscribe to Cluster Computing