I wrote this small subroutine that compares simple vector mathematical functions, performed either with a loop:

f(i) = a(i) + b(i)

or direct:

f = a + b

or using Intel MKL VML:

vdAdd(n,a,b,f)

The timing results for n=50000000 are:

VML 0.9 sec

direct 0.4

loop 0.4

And I dont understand, why VML takes twice as long as the other methods! (Loop is sometimes faster than direct)

I used threaded MKL with 2 or 1 thread on Intel Core 2 Duo, but the result stays the same.

Flags: /O3 /MT /Qopenmp /heap-arrays0

Subroutine can be found underhttp://paste.ideaslabs.com/show/L6dVLdAOIfand called via

program test

use vmltests

implicit none

call vmlTest()

end program