I can compute GLOPS based on #instructions and instruction type.
For example, I have 17694720000 FMA3, so I have 17694720000*3*8*3*8=283GFLOPS. Great! Intel advisor gives the same number.
However, how to get Data transfers between CPU and memory sub-system (total traffic, including L1, L2, LLC and DRAM traffic)? (The number in the bottom right. It does not match with #instructions for memory load.
Metric format: (XXXX – Total count of instructions, YYYY – average count of instructions per iteration).
Warning: Currently Dynamic Instruction Mix doesn’t reflect instructions inside of non-inlined function calls.">Dynamic Instruction Mix Summary
Statistics for FLOPS And Data Transfers
Giga Floating-point Operations Per Second
Self GFLOPS = Self GFLOP / Self Elapsed Time
Self AI - Self Arithmetic Intensity - Ratio Of Self Floating-Point Operations To Self L1 Transferred Bytes
Giga Floating-Point Operations, Not Including GFLOP For Functions Called In The Loop Or Function
Self Elapsed Time
Elapsed Time Is The Exclusive (Self-Time-Based) Wall Time From The Beginning To The End Of Loop/Function Execution. For Single-Threaded Applications Elapsed Time Is Equal To Self-Time
Total Elapsed Time
Total Elapsed Time Is The Inclusive (Total-Time-Based) Wall Time From The Beginning To The End Of Loop/Function Execution. For Single-Threaded Applications Total Elapsed Time Is Equal To Total-Time
Data transfers between CPU and memory sub-system (total traffic, including L1, L2, LLC and DRAM traffic)
In Giga Bytes, Not Including Transfers For Functions Called In The Loop Or Function
In Giga Bytes Per Second