Software Tuning, Performance Optimization & Platform Monitoring

Finite difference computation blowup from intel compiler 14, but not 12


I have a finite difference code for wave propagation, because there is a lot of temporary mixed derivative term, I defined a temporary memory buffer and separate them into chunks to store each derivative term for memory efficiency. The code looks like

Wrk = malloc(2*(4*nxe*(2*ne+1) + 15*nxe)*sizeof(float));

computing function:

float *dudz = Wrk + NE; float *dqdz = dudz + nxe;

for (int i=ix0_1; i<ixh_1; i++) dudz [i] = hdzi*(u[i+nxe]-u[i-nxe]);

Branch Address Calculator raise both BTCLEAR and BACLEAR signals?


In this patent:

there are two types of Branch misprediction detection prior to the Execution stage. I believe the two mispredictions raise BTCLEAR and BACLEAR signals. However, I am a little unsure exactly what the difference is between the two events and which is more costly in terms of flushing the pipeline.

What port(s) are reffered to by the "Cycles of Port X Utilized"

Using the General Exploration analysis in VTune 2015 will deliver several columns that refer to port utilization. For example, "Cycles of 1 Port Utilized".   The documentation on these columns is, well, less than helpful.  What are the ports?  What do they do?  If code heavily using the ports, what can be done about it?

event for speculative executed instruction

Hi All,

We need to measure over Intel computers the instruction executed in a speculative manner, but not commited. We need to measure how many
instructions are discarded (over a period of time), to see how the speculative execution is working. We check the manual with the Performance Monitoring Events (from Intel) but cant figure out which event to monitor. If you could please know where to look for it, or with which name we should look for it.

Precision lost when compiled with -xAVX or -xHOST

Hi all,

My program output is in double and floating point values when i compile without -xAVX or -xHOST options results are correct but most of the loops aren't getting vectorized but when i use -xAVX or -xHOST option most of loops are getting vectorized and even the performance has been improved but the precision is lost. When I execute same program for a larger dataset this small precision loss is resulting in wrong output. I've even tried -fp-model precise/strict options along with -xHOST but still i'm getting wrong output.

Little Bug about IntelPerformanceCounterMonitorV2.8/


  I tried make, and there was something wrong.

  In Makefile line7, it seems to be vpath %.o .. rather than vpath %.cpp ..

  Besides, line9, msr.o cpucounters.o pci.o client_bw.o, missed utils.o.

  is that so?



Float pointing exceptions

Hi, I have a desire to understand the format of IEEE 754 (for example, I chose the addition operation), but I have a problem: the format is not accurately described the formation of the status bits. For example, I found an algorithm that also through the expansion of the mantissa of the result (three bits right) allows you to monitor an inaccurate result (rounding mode - to nearest). I decided to simulate the algorithm and compare the results with my processor intel i7 (Control Register "cwr") and I get different results.

Xeon E5 MSR_PP1_ENERGY_STATUS read/write Error


I am using the above utility to determine the power consumption of Xeon E5 chip. When I execute the above code on my machine the output is


Found Haswell CPU
Checking core #0
Power units = 0.125W
Energy units = 0.00006104J
Time units = 0.00097656s

Assine o Software Tuning, Performance Optimization &amp; Platform Monitoring