vsSin(..) much slower than sinf(..)??

vsSin(..) much slower than sinf(..)??

goreproducers's picture

Hi!

i have a little problem. I tested the two functions vsSin and sinf because i wanted to know which of these two functions is the faster one.

here is my code :


Code:

float value; __int64 time1,time2,time3,time4; float a[10000]; float b[10000]; int n=10000; int mode; mode=VML_LA|VML_FLOAT_CONSISTENT|VML_ERRMODE_IGNORE; vmlSetMode(mode); for (int j=0;j<10000;j++) a[j] = (float)(rand()%8); QueryPerformanceCounter((LARGE_INTEGER*)&time1); for (int i=0;i<10000;i++) value=sinf(a[i]); QueryPerformanceCounter((LARGE_INTEGER*)&time2); QueryPerformanceCounter((LARGE_INTEGER*)&time3); vsSin(n,a,b); QueryPerformanceCounter((LARGE_INTEGER*)&time4); printf("time: %d ",time2-time1); printf("time: %d ",time4-time3);



and now the result

sinf(..) took 1608 ticks (or what ever QueryPerformanceCounter returns ;) )
vsSin(..) took 192344 ticks best


why is vsSin so slow???
Did i something wrong?

thanks for answers.

GoreProducers

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
TimP (Intel)'s picture

I suppose the compiler may be able to replace your first loop by
value=sinf(a[9999]);
or may do nothing there, since you don't use value.

You could check (e.g. by saving .asm) to see whether that loop produces an svml library call or a single evaluation, if even that.

Andrey Kolesov (Intel)'s picture

Hi!

Compiler actually does eliminate "dead code" of sinf loop, because sinf results are used nowhere.
Look at the generated asm:

=============================================================
call DWORD PTR __imp__QueryPerformanceCounter@4

.B1.6:
lea eax, DWORD PTR [esp+16]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4

.B1.7:
lea eax, DWORD PTR [esp+24]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4

.B1.8:
lea edx, DWORD PTR [esp+40]
lea eax, DWORD PTR [esp+40040]
push eax
push edx
push 10000
call _vsSin

.B1.17:
add esp, 12

.B1.9:
lea eax, DWORD PTR [esp+32]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4
=============================================================

As one can see there is no sinf loop between first two QueryPerformanceCounter calls.
To avoid such situation in future use one of two (or combination) methods:
1) compile your timing routine with optimization disabled - /Od compiler switch
2) emulate timed function results usage. For example, just print sinf values like:

======================================================
QueryPerformanceCounter((LARGE_INTEGER*)&time1);
for (int i=0;i b[i]=sinf(a[i]);
QueryPerformanceCounter((LARGE_INTEGER*)&time2);

QueryPerformanceCounter((LARGE_INTEGER*)&time3);
vsSin(n,a,b);
QueryPerformanceCounter((LARGE_INTEGER*)&time4);

for(i=0; i < n; i++)
{
printf("%f ", b[i]);
}
======================================================

By the way your timing results almost agree with actual VML performance (see vml notes).

Another one hint for accuracte timing - repeat your timing procedure several times (10-20).
And take the best result of them.

======================================================
besttime = INT_MAX;
curtime = 0;

for(int repeat = 0; repeat < 15; repeat++)
{
QueryPerformanceCounter((LARGE_INTEGER*)&time3);
vsSin(n,a,b);
QueryPerformanceCounter((LARGE_INTEGER*)&time4);
curtime = time4 - time3;
if(curtime < besttime)
besttime = curtime;
}

printf("time: %d
",besttime);
======================================================

This hint will help you to avoid two issues - "cold cach
e" effect and operation system impact to performance measuring.

The best regards and good luck!

Andrey K.

Login to leave a comment.