cblas_dgemm slows down a lot for Linux on Pentium 4 machine

cblas_dgemm slows down a lot for Linux on Pentium 4 machine

Hi,
I've found in my benchmark that compared with MKL 7.2, in the release 10.1.1.019 for Linux, cblas_dgemm slows down a lot when matrix size is amall and beta is set to 0 on Pentium 4 machine.
Is this a known issue? Has it been fixed?
Thanks a lot!

6 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Quoting - xearthl

I've found in my benchmark that compared with MKL 7.2, in the release 10.1.1.019 for Linux, cblas_dgemm slows down a lot when matrix size is amall and beta is set to 0 on Pentium 4 machine.
Is this a known issue? Has it been fixed?

Did MKL 7.2 have a library specifically optimized for P4? It might not be entirely surprising that a recent MKL was not optimized specifically for an out of production CPU. Of course, P4 covered a fairly wide range from the original 32-bit one to the later 64-bit version. "small" might be in the eyes of the beholder; I doubt there was ever an effort to optimize MKL for cases such as 6x6 or less, where MKL never could compete with Fortran MATMUL.

Quoting - xearthl
Hi,
I've found in my benchmark that compared with MKL 7.2, in the release 10.1.1.019 for Linux, cblas_dgemm slows down a lot when matrix size is amall and beta is set to 0 on Pentium 4 machine.
Is this a known issue? Has it been fixed?
Thanks a lot!

Hello,

First of all the newest version of MKL 10.2 Update 2 is available now at the Registration Center.

Could you tell please what is the size of your matrix?

Thanks,
Art

Quoting - Artem Vorobiev (Intel)

Hello,

First of all the newest version of MKL 10.2 Update 2 is available now at the Registration Center.

Could you tell please what is the size of your matrix?

Thanks,
Art

Hi,
The execution time doubles when the matrix is 32*32 in my benchmark.
Thanks,
xearthl

Hi xearthl,

I compared MKL 7.2 and MKL 10.2 Update 2 and obtained about 15% increase in preformance with MKL 10.2 Update 2.

I performed multiplication of 10000 random matrices of size 32*32 using the code like this:

start_time = dsecnd();

for(i = 0; i < 10000; i++)
{
cblas_dgemm(...);
}

end_time = dsecnd();

printf("Execution time in seconds: %fn", tend - tstart);

Could you please provide an example of code that causes the performance slow down?

Thanks,
Art

Hi Art,

Thanks a lot for your great effort!

I haven't try the latest version. But it used to only slow down when beta is set to 0.

Thanks,

xearthl

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!