Hi everybody!
This question resembles slightly my otherthread posted here http://software.intel.com/en-us/forums/showthread.php?t=105474
While porting my library of elementary and special functions from Java to C I implemented polynomial approximation as it was advised me by few posters.Now I use poly approximation in my librarywhere it is applicable.I was interested in performance measurement between the same implementation written in managed code and in native code.To my big surprise java code always executed faster than native code.
After studying asm code and knowing than Intel c++ compiler uses security cookie checking and fills the buffer with 128 int 3 (0xcc) instructionsright after function's prolog.I came to conclusion that this is compiler induced overhead which is responsible for slower execution of C code.
Here are the tests taken from my thread http://software.intel.com/en-us/forums/showthread.php?t=105474
Can anybody help me to understand why native code can be so slow when compared to Java code.
result for native code 1 million iterations.
start value of fastsin(): 39492698 // Native code
end value of fastsin() : 39492760
delta of fastsin() is : 62 millisec
sine is: 0.841470444509448080000000
java -server
C:\\Program Files\\Java\\jdk1.7.0\\bin>java -server SineFunc
start value : 1339596068015
end value : 1339596068045
running time of fastsin() is :30 milisec
java -client
C:\\Program Files\\Java\\jdk1.7.0\\bin>java -client SineFunc
start value : 1339596081083
end value : 1339596081130
running time of fastsin() is :47 milisec
Here is the fastsin() prologue
0000055 push ebp
000018b ec mov ebp, esp
0000381 ec 80 00 00
00 sub esp, 128; 00000080H
0000957 push edi
0000a8d 7d 80 lea edi, DWORD PTR [ebp-128]
0000db9 20 00 00 00 mov ecx, 32; 00000020H
00012b8 cc cc cc cc mov eax, -858993460; ccccccccH
00017f3 ab rep stosd <-- Can be this culprit for slower code execution
And here is the code.Java implementation is identical to this code.
double fastsin(double x){
double sum = 0;
double half_pi,zero_arg;
half_pi = Pi/2;
zero_arg = Zero;
if(x > half_pi){ // simple input checking range 0 return (x-x)/(x-x) ;
}else if (x < zero_arg){
return (x-x)/(x-x);
}else{
double coef1,coef2,coef3,coef4,coef5,coef6,coef7,coef8,coef9,coef10,coef11,rad,sqr;
coef1 = -0.16666666666666666666666666666667;// 1/3!
coef2 = 0.00833333333333333333333333333333;// 1/5!
coef3 = -1.984126984126984126984126984127e-4;// 1/7!
coef4 = 2.7557319223985890652557319223986e-6;// 1/9!
coef5 = -2.5052108385441718775052108385442e-8;// 1/11!
coef6 = 1.6059043836821614599392377170155e-10;// 1/13!
coef7 = -7.6471637318198164759011319857881e-13;// 1/15!
coef8 = 2.8114572543455207631989455830103e-15 ;// 1/17!
coef9 = -8.2206352466243297169559812368723e-18;// 1/19!
coef10 = 1.9572941063391261230847574373505e-20;// 1/21!
coef11 = -3.8681701706306840377169119315228e-23;// 1/23!
rad = x;//
sqr = x*x; //x^2
sum = rad+rad*sqr*(coef1+sqr*(coef2+sqr*(coef3+sqr*(coef4+sqr*(coef5+sqr*(coef6+sqr*(coef7+sqr*(coef8+sqr*(coef9+sqr*(coef10+sqr*(coef11)))))))))));
}
return sum;
}




