vsSin(..) much slower than sinf(..)??

vsSin(..) much slower than sinf(..)??

Hi!

i have a little problem. I tested the two functions vsSin and sinf because i wanted to know which of these two functions is the faster one.

here is my code :

Code:

	float value;
	__int64 time1,time2,time3,time4;

	float a[10000];
	float b[10000];
	int n=10000;
	int mode;

  mode=VML_LA|VML_FLOAT_CONSISTENT|VML_ERRMODE_IGNORE;
  vmlSetMode(mode);

  for (int j=0;j<10000;j++)
     a[j] = (float)(rand()%8);


  QueryPerformanceCounter((LARGE_INTEGER*)&time1);
    for (int i=0;i<10000;i++)
      value=sinf(a[i]);
  QueryPerformanceCounter((LARGE_INTEGER*)&time2);

  QueryPerformanceCounter((LARGE_INTEGER*)&time3);
     vsSin(n,a,b);
  QueryPerformanceCounter((LARGE_INTEGER*)&time4);

  printf("time: %d
",time2-time1);
  printf("time: %d
",time4-time3);



and now the result

sinf(..) took 1608 ticks (or what ever QueryPerformanceCounter returns ;) )
vsSin(..) took 192344 ticks best

why is vsSin so slow???
Did i something wrong?

thanks for answers.

GoreProducers

publicaciones de 3 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

I suppose the compiler may be able to replace your first loop by
value=sinf(a[9999]);
or may do nothing there, since you don't use value.

You could check (e.g. by saving .asm) to see whether that loop produces an svml library call or a single evaluation, if even that.

Hi!

Compiler actually does eliminate "dead code" of sinf loop, because sinf results are used nowhere.
Look at the generated asm:

=============================================================
call DWORD PTR __imp__QueryPerformanceCounter@4

.B1.6:
lea eax, DWORD PTR [esp+16]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4

.B1.7:
lea eax, DWORD PTR [esp+24]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4

.B1.8:
lea edx, DWORD PTR [esp+40]
lea eax, DWORD PTR [esp+40040]
push eax
push edx
push 10000
call _vsSin

.B1.17:
add esp, 12

.B1.9:
lea eax, DWORD PTR [esp+32]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4
=============================================================

As one can see there is no sinf loop between first two QueryPerformanceCounter calls.
To avoid such situation in future use one of two (or combination) methods:
1) compile your timing routine with optimization disabled - /Od compiler switch
2) emulate timed function results usage. For example, just print sinf values like:

======================================================
QueryPerformanceCounter((LARGE_INTEGER*)&time1);
for (int i=0;i b[i]=sinf(a[i]);
QueryPerformanceCounter((LARGE_INTEGER*)&time2);

QueryPerformanceCounter((LARGE_INTEGER*)&time3);
vsSin(n,a,b);
QueryPerformanceCounter((LARGE_INTEGER*)&time4);

for(i=0; i < n; i++)
{
printf("%f ", b[i]);
}
======================================================

By the way your timing results almost agree with actual VML performance (see vml notes).

Another one hint for accuracte timing - repeat your timing procedure several times (10-20).
And take the best result of them.

======================================================
besttime = INT_MAX;
curtime = 0;

for(int repeat = 0; repeat < 15; repeat++)
{
QueryPerformanceCounter((LARGE_INTEGER*)&time3);
vsSin(n,a,b);
QueryPerformanceCounter((LARGE_INTEGER*)&time4);
curtime = time4 - time3;
if(curtime < besttime)
besttime = curtime;
}

printf("time: %d
",besttime);
======================================================

This hint will help you to avoid two issues - "cold cach
e" effect and operation system impact to performance measuring.

The best regards and good luck!

Andrey K.

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya