I need to do something like this A = B + C * D where A, B, C, D are vectors and the operation is done elemnt wise.Currently I am using vdMul followed by a vdAdd ... This is not entirely efficient (even if my processor has no FMA instruction set) because of cache coherency and how the instructions are issued... i.e. all my adders are sitting idle when i do my muls, and all my muls are idle when i do my adds.Is there a more efficient way to do this?Side: is dger the only way to do the "outer-product" between two vectors? in MKL?
For more complete information about compiler optimizations, see our Optimization Notice.