MKL 6.1 - slow BLAS (zgemm) performance with small matrices

MKL 6.1 - slow BLAS (zgemm) performance with small matrices

I have found that MKL 6.1 BLAS zgemm seems to perform significantly more slowly than netlib BLAS source for small (6x6) matrices. One source of the slowdown appears (according to Rational Quantify) to be a memcpy operation occurring in MKL zgemm.
I understand that MKL would be much faster when the matrices get larger ( and in other parts of my code does help quite a lot), but perhaps the MKL engineers could look at some way to avoid time consuming set-up etc and skip to simple serial code when matrices are 'small'

3 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

It is true that our focus is on much larger matrices. We have made a very significant effort in MKL 7.0 (which should be available shortly) to focus on the small matrix case as well. In this case, we do tend to use different and simpler algorithms in the small matrix case. For the most part, we do tend to use hybrid algorithms which involve special cases for when things are too small. But usually we do try matrix multiply strategies in DGEMM before we try them on ZGEMM. I suspect that DGEMM in 7.0 will respond better to 6x6 matrices. We would certainly like to find the best solution for all cases.

The best algorithms for large matrices tend to have enormous overheads for small matrices. On a 6x6 matrix for example, the interface itself seems to consume half the time (that is, one could dosignificantly better than netlib BLAS simply by inlining 6x6x6 loops). Having a malloc and some of the other tricks we use certainly doesn't help.

I think your idea is a good one. I will bring it to the attention of the rest of the developers. Unfortunately, it is too late for 7.0, and possibly 7.0.1. I honestly do not know when we will address your specific concern, but I can assure you that we will continue to improve the small matrix cases. It is very much a topic of our attention, as you will see by comparing small DGEMMs between 6.1 and 7.0 when it is released.

Thank you again for your suggestion.

- Greg Henry

Hi Greg,
I am thrilled to hear that I will see improvements in MKL 7.0 for small matrices. I will download the latest 7.0 beta and try it out.
Thanks for your reply.


Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen