Matrix Multiplication, Performance, and Scalability in OpenMP: Student Challenge

This year, a simple matrix multiplication problem was posed to the students and we set up an internal contest, to obtain the fastest serial code. Many versions were submitted, and we finally obtained 20x of improvement over the most naïve implementation. The students learned a lot about compiler optimizations, and above all, the effect of the caches in the performance of the code.

The objective of this exercise was to extrapolate this work to a massive multicore architecture. Having 32 cores to perform the matrix multiplication under the QuickPath memory communication architecture provided a complex enough scenario to explore different solutions.

Des téléchargements sont disponibles sous la licence Creative Commons License. Télécharger maintenant
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.