I'm working on code optimisation. I run on Broadwell E5-2650v4 CPUs Linux machines. I optimised the code so roofline model shows 284.38 GFLOPS of perfomance versus 35.94 GFLOPS of the previous version of optimisation. It is 7.9 times higher. However, elapsed time is only around 2 times difference. I'm not sure it is the sort of difference I should expect...I would expect that roofline model also shows me 2 times difference in performance...What there could be wrong?
I'm attaching two screenshots of roofline models built with Intel Advisor.