I can compute GLOPS based on #instructions and instruction type.
For example, I have 17694720000 FMA3, so I have 17694720000*3*8*3*8=283GFLOPS. Great! Intel advisor gives the same number.
However, how to get Data transfers between CPU and memory sub-system (total traffic, including L1, L2, LLC and DRAM traffic)? (The number in the bottom right. It does not match with #instructions for memory load.
Dynamic Instruction Mix Summary
|Self GFLOPS||808.89193||Giga Floating-point Operations Per Second
Self GFLOPS = Self GFLOP / Self Elapsed Time
|Self AI||2.18182||Self AI - Self Arithmetic Intensity - Ratio Of Self Floating-Point Operations To Self L1 Transferred Bytes|
|Self GFLOP||283.11552||Giga Floating-Point Operations, Not Including GFLOP For Functions Called In The Loop Or Function|
|Self Elapsed Time||0.350s||Elapsed Time Is The Exclusive (Self-Time-Based) Wall Time From The Beginning To The End Of Loop/Function Execution. For Single-Threaded Applications Elapsed Time Is Equal To Self-Time|
|Total Elapsed Time||0.350s||Total Elapsed Time Is The Inclusive (Total-Time-Based) Wall Time From The Beginning To The End Of Loop/Function Execution. For Single-Threaded Applications Total Elapsed Time Is Equal To Total-Time|
|In Giga Bytes, Not Including Transfers For Functions Called In The Loop Or Function||129.76128|
|In Giga Bytes Per Second||370.74213|