I have a c++ Vector4d class and was hoping to improve its performance using ipp. I had 4 doubles with the values and exchanged them for a vector (which I allocate with ippsMalloc).
Now, I only perform regular operations (+, -, dot product) and I tested the class against its old implementation, using the dot product for that (its the operation I use the most):
Using g++ without optimization flags, I got slightly better results with ipp. With compiler optimization, though, the code without ipp was a lot faster (!!!).
What I tried then was to perform the dot product by hand in the ipp class (accessing the vector values directly from the allocated array). There was a significant performance boost and the ipp class overcame the other one.
Why is that? Memory allignment? Why isn't the ippmDotProduct_vv_64f_4x1 functioon faster than the direct calculation (v*a.v + v*a.v + v*a.v + v*a.v)?
Why isn't it faster?