cblas_sgemm_pack result is not consistent with cblas_gemm

cblas_sgemm_pack result is not consistent with cblas_gemm

Hello,

I wrote a short code to call sgemm_pack to speed up. But the result is not consistent with cblas_sgemm.

For example,

Matrix A (2 x 2): [1.0, 2.0, 3.0, 4.0]

Matrix B (2 x 1): [1.0, 2.0]

With the row major, Matrix C (2 x 1) = A * B = [5, 11]. But with sgemm_pack + sgemm_compute, the result is [0.0, 0.0].

Could you please take a look. Any advice is welcomed.

Thanks

---

Environments: I use parallel  studio xe. the version is 2017.1.132.

Build command: icc gemm_pack.c -I${MKLROOT}/include -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl -std=c99

---

The sample code,

#include <stdio.h>
#include <mkl.h>

void print(float* a, int length, const char* name)
{
  int i = 0;
  for (i = 0; i < length; i++) {
    printf("%s[%d] = %f\n", name, i, a[i]);
  }
}

int main(void)
{
  int m = 2;
  int n = 1;
  int k = 2;

  float *a, *b, *c;
  a = (float*)malloc(sizeof(float) * m * k);
  b = (float*)malloc(sizeof(float) * k * n);
  c = (float*)malloc(sizeof(float) * m * n);

  int i = 0;
  for (i = 0; i < m *k; i++) {
    a[i] = i + 1;
  }
  for (i = 0; i < k * n; i++) {
    b[i] = i + 1;
  }

  float alpha = 1.0f;
  float beta = 0.0f;
  int lda = k;
  int ldb = n;
  int ldc = n;

  printf("========================SGEMM_PACK========================\n");
  print(a, m * k, "a");
  print(b, k * n, "b");
  float *packA = cblas_sgemm_alloc(CblasAMatrix, m, n, k);
  cblas_sgemm_pack(CblasRowMajor, CblasAMatrix, CblasNoTrans, m, n, k, alpha, a, lda, packA);

  cblas_sgemm_compute(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, packA, lda, b, ldb, beta, c, ldc);

  cblas_sgemm_free(packA);
  print(c, m * n, "c");

  printf("========================SGEMM========================\n");
  print(a, m * k, "a");
  print(b, k * n, "b");
  cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);
  print(c, m * n, "c");

  return 0;
}

 

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello,

Since A is already packed, please specify CblasPacked instead of CblasNoTrans.

   cblas_sgemm_compute(CblasRowMajor, CblasPacked, CblasNoTrans, m, n, k, packA, lda, b, ldb, beta, c, ldc);

Thanks.

 

Leave a Comment

Please sign in to add a comment. Not a member? Join today