I was wondering if there are Intel tools which allow the performance monitoring and tuning of hybrid MPI + OMP (or multi-threaded) code.
Suppose I have MPI code which also makes calls to multi-threaded MKL routines so that each MPI task really consists of a number of MKL threads.
What would be a good way of investigating the performance of this hybrid code? Can I collect perfomance data with h/w perf. counters for each task and then combine it to get the complete picture afterwards ?
I am familiar with Intel MPI trace analyzer and Vtune tools but it is not clear to me how I could combine thread+task performance observations for an entire hybrid MPI+OMP/MKL code.
May I for instance use mpirun to start the vtune command line binary which in turn launches regular MPI code and then combine the results?
I would be most useful to be able to combine h/w perf counter data / thread with those of all threads in a hybrid MPI code ...
thanks -- michael