Optimización

analyzing .optrpt file

Hi,

With the intention to ensure/check if a given loop is parallel or not, I generated an .optrpt file.

I used the following flags for the same:   -parallel -opt-report-phase=par -opt-report:5

However I have three doubts persisting that the auto-parallelization flags failed to answer.

1) I had a two level nested loop like this
do k=1,km-1
                   do kk=1,2
.
.
<code is here>
.
.
                   enddo
enddo

Haswell Transactional Memory read/write-set information

Recently, Intel release haswell machines which support hardware transactional memory called transactional synchronization extension(TSX).

As Intel manual said, Speculative memory operations, write-set and read-set, are buffered in L1 cache and L2 cache each. (not exactly)

Then, Can I track transactional memory operations and get information like address, and values of read/write-set?

I have a problem with igzip

Hi!
I am studying about compression algorithm and software.
I have question about igzip. I download igzip library in intel homepage.
But I don`t know how to make wrapper.
Can you send me 'example of wrapper' or 'example code' or 'manual'?
I read homepage and saw a simple application.
I don`t know how to input target file for compression and to output compression file
and how to decompression?
Do I make code about 'fast_lz and init_stream' function by myself?
Plz help me.
thank you

PCM reporting lower than expected memory read counts

I have a piece of code on which I'm running PCM (Performance Counter Monitor). It is essentially the following:

uint64_t *a,*b;
a = new uint64_t[LEN];
b = new uint64_t[LEN];
for( int i=0;i<LEN;i++ ) a[i] = b[i];

With LEN set to 402,653,184 (384 Mi), PCM is reporting 0.72 GB under READ and 6.30 GB under WRITE. Given that each array is 3 GiB, I would expect that both arrays would be read (since processor uses write-allocate), giving a READ of about 6 GiB. I would expect array "a" to be written back, giving a write count of 3 GiB.

GPU monitoring API

Hello,

I have installed Media SDK and successfully run samples and also FFmpeg with hardware support of Intel Quick Sync technology.

Now I want to estimate the performance of my solution and I need to get load metrics from GPU.

The only tool I've found is /opt/intel/mediasdk/tools/metrics_monitor/_bin/metrics_monitor  but it has strange format and also not perfectly suit my needs.

So I want to know is there any API which I can use in my own program which provide me an interface to GPU metrics?

 

Thanks!

Extracting results based on time ranges

Hello, I am interested in using Vtune to profile a system. I have run a project and gathered the results. I am looking at the hardware event samples for a specific cpu. EG: All hardware events for CPU 0. The problem I am having is that I want to look at the results based on small time intervals. Basically I want to see the results for every 15ms. 

compiler option O0 and O2 generate different results

Hi all,

I am optimizing my code. When I use O2 compiler option, the code runs repeatedly two times, the results are different in each repeat. But when I change to O0 compiler option, the results are the same. What might be the reason?

BTW, I am using CentOS 6.5 and the command line is as follows:

icpc -O2 -DALIGN_OPT=16 -DSSE -ipo main_turbo_decoding_cfunc.cpp turbo_decoding_cfunc.cpp -o turbo_decoding -lrt

Thank you!

Suscribirse a Optimización