How can I understand the result of PMU

How can I understand the result of PMU

Chenjie Y.'s picture

Hi, all.

My CPU is Core architecture(T7100), I found in datasheet there was a event, FP_COMP_OPS_EXE, for monitoring floating point mico-ops.

And I write a very very simple benchmark,test.c  to test this counter, like 

int main(void)
 {
   float i;
   i=i+0.01;

 }

then gcc -o test.out test.c
then,I use perf to monitor, the commond is, (0010 is Umask|Event_number): perf stat -e r0010 ./test.out &

And get the result

 Performance counter stats for './test.out':

             1,398 raw 0x10                                                   

       0.001437684 seconds time elapsed

My question is how can understand the number 1,398. Accurately, my code only contains one FADD operation. Is that means the FADD is translated into 1,398 micro-ops? or I misundestand the meaning of micro-ops ?

Thank you.

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Patrick Fay (Intel)'s picture

Hello Chenjie,

The monitoring utility 'perf' doesn't start monitoring at your main(). It starts monitoring before your program is loaded. So it counts (probably) some uops in perf, some uops due to loading your program, some uops due to initializing everything for your program and then, after all that, the instructions in your program. And then the uops for cleaning up after your program, and returning to perf. Since your program includes floating point, linux may (I'm 99.999% sure) also setup extended save/restore registers to hold the sse2 state in case of context switches.

Lastly, you need to look at the disassemby of the binary to see what your program is actually doing. It may or may not be doing what you think... especially since you don't return any value or print anything out... the compiler may (as an optimization) just be executing a return. And since you are using an uninitalized variable 'i', you might be getting exceptions.

You could try inserting a loop to see if there is a base number of uops that always gets executed (say when the loop count==0) and a number of uops that increased in proportion to the loop. That would probably provide more insights.

Pat

Patrick Fay (Intel)'s picture

Hello Chenjie,

The monitoring utility 'perf' doesn't start monitoring at your main(). It starts monitoring before your program is loaded. So it counts (probably) some uops in perf, some uops due to loading your program, some uops due to initializing everything for your program and then, after all that, the instructions in your program. And then the uops for cleaning up after your program, and returning to perf. Since your program includes floating point, linux may (I'm 99.999% sure) also setup extended save/restore registers to hold the sse2 state in case of context switches.

Lastly, you need to look at the disassemby of the binary to see what your program is actually doing. It may or may not be doing what you think... especially since you don't return any value or print anything out... the compiler may (as an optimization) just be executing a return. And since you are using an uninitalized variable 'i', you might be getting exceptions.

You could try inserting a loop to see if there is a base number of uops that always gets executed (say when the loop count==0) and a number of uops that increased in proportion to the loop. That would probably provide more insights.

Pat

Chenjie Y.'s picture

Dear Patrick, thank you. I get your idea.

Login to leave a comment.