What is the formula of Data transfers between CPU and memory sub-system (total traffic, including L1, L2, LLC and DRAM traffic)?

What is the formula of Data transfers between CPU and memory sub-system (total traffic, including L1, L2, LLC and DRAM traffic)?

I can compute GLOPS based on #instructions and instruction type.

For example, I have 17694720000 FMA3, so I have 17694720000*3*8*3*8=283GFLOPS. Great! Intel advisor gives the same number.

However, how to get Data transfers between CPU and memory sub-system (total traffic, including L1, L2, LLC and DRAM traffic)? (The number in the bottom right. It does not match with #instructions for memory load. 

AVX; FMA

Instruction Set

3.842s

Self time

Metric format: (XXXX – Total count of instructions, YYYY – average count of instructions per iteration).
Warning: Currently Dynamic Instruction Mix doesn’t reflect instructions inside of non-inlined function calls.">Dynamic Instruction Mix Summary

Memory

34% (11796480000)

 

Vector

34% (11796480000)

 

AVX

34% (11796480000)

 

Compute

57% (19722240000)

 

Vector

51% (17694720000)

 

FMA

51% (17694720000)

 

Scalar

6% (2027520000)

 

x86

6% (2027520000)

 

 

Other

9% (3133440000)

 

Statistics for FLOPS And Data Transfers

Self GFLOPS
808.89193
Giga Floating-point Operations Per Second
Self GFLOPS = Self GFLOP / Self Elapsed Time

Self AI
2.18182
Self AI - Self Arithmetic Intensity - Ratio Of Self Floating-Point Operations To Self L1 Transferred Bytes

Self GFLOP
283.11552
Giga Floating-Point Operations, Not Including GFLOP For Functions Called In The Loop Or Function

Self Elapsed Time
0.350s
Elapsed Time Is The Exclusive (Self-Time-Based) Wall Time From The Beginning To The End Of Loop/Function Execution. For Single-Threaded Applications Elapsed Time Is Equal To Self-Time

Total Elapsed Time
0.350s
Total Elapsed Time Is The Inclusive (Total-Time-Based) Wall Time From The Beginning To The End Of Loop/Function Execution. For Single-Threaded Applications Total Elapsed Time Is Equal To Total-Time

Data transfers between CPU and memory sub-system (total traffic, including L1, L2, LLC and DRAM traffic)

In Giga Bytes, Not Including Transfers For Functions Called In The Loop Or Function
129.76128
 

In Giga Bytes Per Second
370.74213

 

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.