vsSin(..) much slower than sinf(..)??

vsSin(..) much slower than sinf(..)??

Hi!

i have a little problem. I tested the two functions vsSin and sinf because i wanted to know which of these two functions is the faster one.

here is my code :

Code:

	float value;
	__int64 time1,time2,time3,time4;

	float a[10000];
	float b[10000];
	int n=10000;
	int mode;

  mode=VML_LA|VML_FLOAT_CONSISTENT|VML_ERRMODE_IGNORE;
  vmlSetMode(mode);

  for (int j=0;j<10000;j++)
     a[j] = (float)(rand()%8);


  QueryPerformanceCounter((LARGE_INTEGER*)&time1);
    for (int i=0;i<10000;i++)
      value=sinf(a[i]);
  QueryPerformanceCounter((LARGE_INTEGER*)&time2);

  QueryPerformanceCounter((LARGE_INTEGER*)&time3);
     vsSin(n,a,b);
  QueryPerformanceCounter((LARGE_INTEGER*)&time4);

  printf("time: %d
",time2-time1);
  printf("time: %d
",time4-time3);



and now the result

sinf(..) took 1608 ticks (or what ever QueryPerformanceCounter returns ;) )
vsSin(..) took 192344 ticks best

why is vsSin so slow???
Did i something wrong?

thanks for answers.

GoreProducers

3 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

I suppose the compiler may be able to replace your first loop by
value=sinf(a[9999]);
or may do nothing there, since you don't use value.

You could check (e.g. by saving .asm) to see whether that loop produces an svml library call or a single evaluation, if even that.

Hi!

Compiler actually does eliminate "dead code" of sinf loop, because sinf results are used nowhere.
Look at the generated asm:

=============================================================
call DWORD PTR __imp__QueryPerformanceCounter@4

.B1.6:
lea eax, DWORD PTR [esp+16]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4

.B1.7:
lea eax, DWORD PTR [esp+24]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4

.B1.8:
lea edx, DWORD PTR [esp+40]
lea eax, DWORD PTR [esp+40040]
push eax
push edx
push 10000
call _vsSin

.B1.17:
add esp, 12

.B1.9:
lea eax, DWORD PTR [esp+32]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4
=============================================================

As one can see there is no sinf loop between first two QueryPerformanceCounter calls.
To avoid such situation in future use one of two (or combination) methods:
1) compile your timing routine with optimization disabled - /Od compiler switch
2) emulate timed function results usage. For example, just print sinf values like:

======================================================
QueryPerformanceCounter((LARGE_INTEGER*)&time1);
for (int i=0;i b[i]=sinf(a[i]);
QueryPerformanceCounter((LARGE_INTEGER*)&time2);

QueryPerformanceCounter((LARGE_INTEGER*)&time3);
vsSin(n,a,b);
QueryPerformanceCounter((LARGE_INTEGER*)&time4);

for(i=0; i < n; i++)
{
printf("%f ", b[i]);
}
======================================================

By the way your timing results almost agree with actual VML performance (see vml notes).

Another one hint for accuracte timing - repeat your timing procedure several times (10-20).
And take the best result of them.

======================================================
besttime = INT_MAX;
curtime = 0;

for(int repeat = 0; repeat < 15; repeat++)
{
QueryPerformanceCounter((LARGE_INTEGER*)&time3);
vsSin(n,a,b);
QueryPerformanceCounter((LARGE_INTEGER*)&time4);
curtime = time4 - time3;
if(curtime < besttime)
besttime = curtime;
}

printf("time: %d
",besttime);
======================================================

This hint will help you to avoid two issues - "cold cach
e" effect and operation system impact to performance measuring.

The best regards and good luck!

Andrey K.

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui