Threading on Intel® Parallel Architectures

I have a problem with igzip

I am studying about compression algorithm and software.
I have question about igzip. I download igzip library in intel homepage.
But I don`t know how to make wrapper.
Can you send me 'example of wrapper' or 'example code' or 'manual'?
I read homepage and saw a simple application.
I don`t know how to input target file for compression and to output compression file
and how to decompression?
Do I make code about 'fast_lz and init_stream' function by myself?
Plz help me.
thank you

PCM reporting lower than expected memory read counts

I have a piece of code on which I'm running PCM (Performance Counter Monitor). It is essentially the following:

uint64_t *a,*b;
a = new uint64_t[LEN];
b = new uint64_t[LEN];
for( int i=0;i<LEN;i++ ) a[i] = b[i];

With LEN set to 402,653,184 (384 Mi), PCM is reporting 0.72 GB under READ and 6.30 GB under WRITE. Given that each array is 3 GiB, I would expect that both arrays would be read (since processor uses write-allocate), giving a READ of about 6 GiB. I would expect array "a" to be written back, giving a write count of 3 GiB.

Get some problem with global variable declaration

I try use an Intel PHI co-prococessor.But i got some problem with global variable declaration .I decline A,B,C as global variable.But the value of them are equal。Turn out to be,A=5,B=5,C=5.And AA=30.The right AA must be 17.Try to get some help here.Thanks.


#include <stdio.h>
#include <math.h>
#include <omp.h>
#pragma offload_attribute(push,target(mic))
float *A;
float *B;
float *C;
#pragma offload_attribute(pop)

//__attribute__((target(mic))) float *A,*B,*C;

How to get Individual Core L1 and L2 cache hit/misses data when hyper-threading in multi core environment

Scenario : 2 Process are executing on 2 different cores respectively of a processor. How can i measure Individual core L1 and L2 Cache hits and miss for each core assuming hyper threading are disabled. Performance Counter monitors are not providing me individual breakdown i believe. So is there any way i can measure the individual core L1 and L2 cache hits and misses.

Parallel Image Processing in OpenMP - Image Blocks

I'm doing my first steps in the OpenMP world.

I have an image I want to apply a filter on.
Since the image is large I wanted to break it into non overlapping parts and apply the filter on each independently in parallel.
Namely, I'm creating 4 images I want to have different threads.

I'm using Intel IPP for the handling of the images and the function to apply on each sub image.

I described the code here:

Iscriversi a Threading on Intel® Parallel Architectures