Hello, everyone
recently, I am using Vtune to test my BSDE code in hotspot mode. I have found some insteresting things.
int a,a1,a2,a3;
float trans[4];
_mm_store_ps(trans,a_sse);
below are four lines of code
- a= (int)*(trans);
- a1= (int)*(trans+1);
- a2= (int)(trans[2]);
- a3 = (int )trans[3];
compile using gcc with -O0 Optimize optimization, the time each line costs increase as below
