Hi all, I'm testing some vectorization capabilities of the icc compiler with the kernel matrix multiply code. Depending on the order of the loops the compiler vectorizes or not. Therefore I'm compiling with the option -O3 to force icc to do a loop interchange in order to be able to vectorize always the inner loop of the matrix multiply. I declare the matrices inside the main and perform the multiplication also inside the main (to avoid the need of inter procedural information). Here is the code:
int main()
{
int dimi = 148;
int dimj = 148;
int dimk = 148;
int i, j, k;
float **c = (float **)malloc(sizeof(float *)*dimi);
c[0] = (float *)malloc(dimi * dimj * sizeof(float));
for(i = 1; i < dimi; i++)
c[i] = c[0] + i * dimj;
float **b = (float **)malloc(sizeof(float *)*dimk);
b[0] = (float *)malloc(dimk * dimj * sizeof(float));
for(i = 1; i < dimk; i++)
b[i] = b[0] + i * dimj;
float **a = (float **)malloc(sizeof(float *)*dimi);
a[0] = (float *)malloc(dimi * dimk * sizeof(float));
for(i = 1; i < dimi; i++)
a[i] = a[0] + i * dimk;
inic_m(a, dimi, dimk);
inic_m2(b, 2.0, dimk, dimj);
inic_m3(c, 3.1, dimi, dimj);
for (i = 0; i < dimi; i++)
{
for (k = 0; k < dimk; k++)
{
for (j = 0; j < dimj; j++)
{
c[i][j] += a[i][k]*b[k][j];
}
}
}
return 0;
}
But the compiler doesn't do loop interchange. Even with the -O3 optimization option icc leaves the loop order as I specify in the code. Changing the order of the loops I get different information with -vec-report3. So for the orders:
JIK, JKI, KJI and IJK the icc compiler -vec-repor3 option returns (116 is the line number of the inner loop):
matrizXmatrizTests.c(112) : (col. 2) remark: loop was not vectorized: not inner loop. matrizXmatrizTests.c(114) : (col. 3) remark: loop was not vectorized: not inner loop. matrizXmatrizTests.c(116) : (col. 4) remark: vector dependence: assumed FLOW dependence between reference at line 118 and reference at line 118. matrizXmatrizTests.c(116) : (col. 4) remark: vector dependence: assumed FLOW dependence between reference at line 118 and reference at line 118. matrizXmatrizTests.c(116) : (col. 4) remark: vector dependence: assumed FLOW dependence between reference at line 118 and reference at line 118. matrizXmatrizTests.c(116) : (col. 4) remark: vector dependence: assumed FLOW dependence between reference at line 118 and reference at line 118. matrizXmatrizTests.c(116) : (col. 4) remark: vector dependence: assumed FLOW dependence between reference at line 118 and reference at line 118. matrizXmatrizTests.c(116) : (col. 4) remark: vector dependence: assumed FLOW dependence between reference at line 118 and reference at line 118. matrizXmatrizTests.c(116) : (col. 4) remark: vector dependence: assumed FLOW dependence between reference at line 118 and reference at line 118. matrizXmatrizTests.c(116) : (col. 4) remark: vector dependence: assumed FLOW dependence between reference at line 118 and reference at line 118. matrizXmatrizTests.c(116) : (col. 4) remark: loop was not vectorized: existence of vector dependence.
For the orders KIJ and IKJ I get:
matrizXmatrizTests.c(112) : (col. 2) remark: loop was not vectorized: not inner loop. matrizXmatrizTests.c(114) : (col. 3) remark: loop was not vectorized: not inner loop. matrizXmatrizTests.c(116) : (col. 4) remark: LOOP WAS VECTORIZED.Why is not able icc to do loop interchange? Is not icc able to perform loop interchange when the inner loop contains pointers? I'm lost and I've tried to find at icc documentation some answer about why icc doesn't interchange the order of the loops but I didn't find anything. I have to test the power of the icc compiler in order to perform a performance study of compilers and nested loops. I am working on linux server with icc (ICC) 9.1 but soon I will haveIntel C++Compiler Professional Edition 11.1. Thanks in advance.


