I have two matrices A(100K,100K) and B(100K,50) and I want to get C=[B^{T}AB]

A is a matrix from finite differences, but is stored in a vector to exploit its sparsity. It has two referance variables, S which refers to the location in A where the first value of row I is stored. So the range of values in A for ROW I would be S(I) to S(I+1)-1

The second variable is IDX which says the column location of a value in the vector A. For example S(I)+1 would be the second value stored in noncompressed Row I and IDX(S(I)+1) its column.

Below is my current code where I multiply AB first through loops and then multiply B^{T}AB through the MKL/BLAS.

DO J=1,COL DO I=1,ROW DO IJ= S(I), S(I+1)-1 AB(I,J)= AB(I,J) + A(IJ)*B(IDX(IJ),J) END DO END DO END DO CALL DGEMM('T','N',COL,COL,ROW,1D0,B,ROW,AB,ROW,0D0,C,COL)

Any suggestions how I can improve this code/speed it up would be greatly appreciated. Right now it is a major choke point in a program I have been working on for the past six months.

Thanks in advance!