Threading on Intel® Parallel Architectures

Responsive OpenMP Theads in Hybrid Parallel Environment

I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.

Optimizing reduce_by_key implementation using TBB

Hello Everyone,

I'm quite new to TBB & have been trying to optimize reduce_by_key implementation using TBB constructs. However serial STL code is always outperforming the TBB code! It would be helpful if I'm given an idea about how reduce_by_key can be improvised using tbb::parallel_scan. Any help at the earliest would be much appreciated.


performance loss


some interesting performance loss happened with my measurements.

I have a system with two sockets, each socket is a E5-2680 processor. Each processor has 8 cores and with hyper-threading. The hyper-threading was ignored. 

On this system, I started a program 16 times at the same time and each time pinned the program to different cores. At first, i set all cores to 2.7GHz and saw :

Program 0 Runtime 7.7s

Program 8 Runtime 7.63s

And then, i set  cores on the second socket  to 1.2GHz and saw:

Program 0 Runtime 12.18s

Iscriversi a Threading on Intel® Parallel Architectures