Here is a simple question. I'm new to IPP and I'm trying to understand how to use it for solving the following problem:
A += B*C + D*E + F*G + ...
A, B, C, D, E, F, G, ... are all matrices of the same size, * represents standard matrix multiply. The sizes of the matrices are small, typically between 3x3 and 35x35.
IPP provides a routine - I'm looking at the ippmMul_mama_64f function - that would operate on two source arrays of matrices, in our case: [B, D, F] and [C, E, G], producing, as far as I understood, three output matrices A1, A2, and A3, storing the results of B*C, D*E, and F*G, respectively. Now I have two correlated questions:
- In my problem, there's a single output matrix A. Is there a function in IPP, or a safe way of using ippmMul_mama_64f, such that the results are *accumulated* in a single output matrix A, rather than in three different matrices A1, A2, and A3?
- If this is not possible, how do I best combine the three temporaries A1, A2, and A3?
Ah, incidentally: is there any document I can look at that compares the performance of IPP MX to hand-crafted implementatios? I've done a bit of research and I couldn't find much.