I have a function for multiplying the matrix product A^T B which I have placed in a MODULE. When doing some simple timing (using CPU_TIME) I found that making the function internal is about twice as fast as the MODULE procedure. Is there a simple explanation for this?
Also I found that computing MATMUL(A,B) takes the same time as MATMUL(A,TRANSPOSE(B)), implying that the TRANSPOSE operation is optimized away, whereas MATMUL(TRANSPOSE(A),B) takes a lot longer (which is the reason I wrote my own in the first place). Is there a reason for not optimizing away the transposition in tahat case?
PS: the matrices I used for the timing were 600x600 in double precision