OS X 10.4.10
Intel Core Duo 1.66 GHz
I'm just trying out MKL. I'm doing some simple comparisons of VML with a simple complex vector math lib I wrote. My test program crunches a 1000-element complex vector 500k times in a simple loop, first using VML, then my lib.
My lib simply loops through a complex input array and performs element-by-element addition, absolute value or multiplication, depending on the function. My library is doing complex-number multiplication and absolute value.
Here are the timing results:
As you can see, VML is slightly faster for addition, but *much* slower for multiplication and absolute value.
I'm compiling with gcc (from within Matlab) using these options:
-fno-common -no-cpp-precomp -fexceptions -march=pentium4 -O3 -ftree-vectorize -mfpmath=sse -msse -msse2 -Wall -pipe -DHAVE_INLINE -Wno-long-long
I figure gcc 4.0 is doing loop vectorization, so my simple code could be vectorized, but why would it be faster than VML?
I've tried building with different architecture and optimizations, but those don't effect the VML timings, which makes sense.
I've messed around with the various MKL options like accuracy modes and # of threads, but saw no changes.
Does anyone have any thoughts on this? Thanks, I'm very confused.