Software Tuning, Performance Optimization & Platform Monitoring

How branches in loop body affect the performance when unrolling?


I want to know how the branches in loop body affect the performance when unrolling. So I do some tests. Code is in the attachment.

Compiler: icc 15.0.3; options: -O3; platform: Ivy Bridge I5 3337U and Sandy Bridge E5-2670.

I use the #pragma unroll(n) to unroll the innermost loop with different unroll facts, such as 2, 4 and 8.

In both platforms, when unroll(8), the execute time increases nearly 100% than unroll(2)! 

intel xeon hardware cache events not supported

I am trying to use perf tool to measure performance on some program. For some reason perf stat doesn't support hardware cache events. I'm using intel xeon e5-2620 (haswell) processor. I read in some thread in this forum that the event codes might have been changed for this cpu and that is why perf doesn't support these events. I tried using perfmon2 to find the raw events but with no luck.

Does anybody know how to find the correct raw events for hardware cache events for this cpu?

RDPMC Fast Mode

Hi all,
I am currently writing a C++ class which measures performance using the RDPMC instruction.

Everything works as expected, but I noticed in the manual that some of the processors support "fast" mode of the RDPMC instruction (reading only the lower 32 bits of the counter). When I try to do it on mine (i.e. switching the ECX[31]) the code produces seg fault.

This mode is supported on processors with 40 bit counters and the counters on my machine are 48 bit. The model name of my processor is "Intel(R) Xeon(R) CPU W3580".

DRAM Memory reads and writes

Hello All,

I am using Intel core i7-2600 CPU with DDR3-1333 SDRAM. I want to calculate number of reads and writes from and to the DRAM memory respectively. I found the Model specific registers like UNC_DRAM_READ_CAS.CHx, UNC_DRAM_WRITE_CAS.CHx that are not supported on my system (64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf).<!--break-->

PCM / QPI error


we want to monitor QPI links (the number of remove and local accesses) to evaluate our application. Later we want to report our evaluations to be publish on a scientific work. To perform the aforementioned monitoring we are utilizing intel PCM 2.8 tool to extract such numbers. Using intel PCM returns errors which make monitoring impossible. We traced the error in the websites and it seems that this functionality (accessing QPI counters) is not provided by the mainboard.

Like here:

