Threading on Intel® Parallel Architectures

Haswell TSX using RTM (beginner student)

Hello,

I am just getting introduced into haswell's TSX infrastructure using RTM. I have downloaded the rtm.h header files from online and i tried producing a simple counter. Unfortunately every time i compile and run the program, the _xbegin function does not execute the transaction inside.  I would be greatly appreciated for your help. thanks

#include <stdio.h>

#include <stdlib.h>

#include "rtm.h"

void main(){

    int N=5;

    int i;

    int status;

    int counter = 0;

    status = _xbegin();

Using thread_local on C++ throws error

I have been trying to convert a C++ MPI code into OpenMP. There are large number of static member variables (mostly dynamic lists of class objects), and i am trying to use 'thread_local' to make sure there are no conflicts. But the file does not compile and threw error: "error: expected a ";"". I was using ICC 14. 

When i tried to use ICC 15 beta version, the particular file where i used thread_local compiled, but the compilation of the whole application failed at some other point: "undefined reference to '__cxa_thread_atexit'".

HTM/STM and Scheduling

Hi,

I have a question about Hardware and Software Transactional Memory.

Given the types of versioning (eager and lazy) and conflict detection (optimistic and pessimistic) and let's say that 2 or more threads are performing a transaction that write/read the same memory location.

The scheduling of the threads could affect the ability of detect a conflict? Which combination of versioning and conflict detection would be better to always catch the conflicts?

Hope my question is clear.

Thanks.
Best Regards,
Simone

Responsive OpenMP Theads in Hybrid Parallel Environment

I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.

Optimizing reduce_by_key implementation using TBB

Hello Everyone,

I'm quite new to TBB & have been trying to optimize reduce_by_key implementation using TBB constructs. However serial STL code is always outperforming the TBB code! It would be helpful if I'm given an idea about how reduce_by_key can be improvised using tbb::parallel_scan. Any help at the earliest would be much appreciated.

Thanks.

Iscriversi a Threading on Intel® Parallel Architectures