Intel® Moderncode for Parallel Architectures

Function Tracing Tool

I am working on tracing tool for complex multi-threaded distributed application ( C C++ python - running on Vxworks and Linux system ) 

I want to achieve :

--  Tracing for Linux( CentOS ) /Vxworks (preferably  single solution for both)

-- Dynamic runtime  tracing  preferable without source code modification .

-- trace functions and log function names and parameters

-- Preferably trace functions per process/thread

-- Store trace data in a buffer

-- dump the trace data to file on request.

Possible opportunities within open parallel for lop

Dear all,

i wanted know what are the different technique that i can use within already parallel openmp for loop to gain performance.

Like i am working on one code(snippet shown below) in which there is a for loop(already parallelized) calling one function for which intel vtune is showing hotspots, so how i can reduce execution time for that function, like can i use #pragma simd or it will make slower. 


also there is another for loop inside already parallelized openmp for loop which also has hotspots.

Memory access pattern for threads

      I've one basic question on threads memory access pattern. Suppose the computer/system/node has two sockets, each socket has its own block of memory(shared among two sockets), each socket has 4 cores. If there are two threads running(forked from a single process, may be pthreads/openmp threads), and thread 1 is on socket 1 and thread 2 on socket 2. If thread 1 tries to access data from socket 2's block of memory, then whether access time for this is same as accessing the data from its own block of memory or different? 

Haswell Transactional Memory read/write-set information

Recently, Intel release haswell machines which support hardware transactional memory called transactional synchronization extension(TSX).

As Intel manual said, Speculative memory operations, write-set and read-set, are buffered in L1 cache and L2 cache each. (not exactly)

Then, Can I track transactional memory operations and get information like address, and values of read/write-set?

I have a problem with igzip

I am studying about compression algorithm and software.
I have question about igzip. I download igzip library in intel homepage.
But I don`t know how to make wrapper.
Can you send me 'example of wrapper' or 'example code' or 'manual'?
I read homepage and saw a simple application.
I don`t know how to input target file for compression and to output compression file
and how to decompression?
Do I make code about 'fast_lz and init_stream' function by myself?
Plz help me.
thank you

PCM reporting lower than expected memory read counts

I have a piece of code on which I'm running PCM (Performance Counter Monitor). It is essentially the following:

uint64_t *a,*b;
a = new uint64_t[LEN];
b = new uint64_t[LEN];
for( int i=0;i<LEN;i++ ) a[i] = b[i];

With LEN set to 402,653,184 (384 Mi), PCM is reporting 0.72 GB under READ and 6.30 GB under WRITE. Given that each array is 3 GiB, I would expect that both arrays would be read (since processor uses write-allocate), giving a READ of about 6 GiB. I would expect array "a" to be written back, giving a write count of 3 GiB.

Iscriversi a Intel® Moderncode for Parallel Architectures