Intel MKL 11.3 has introduced Intel TBB support.
NumPy UMath Optimizations
Intel® Math Kernel Library Improved Small Matrix Performance Using Just-in-Time (JIT) Code Generation for Matrix Multiplication (GEMM)
The most commonly used and performance-critical Intel® Math Kernel Library (Intel® MKL) functions are the general matrix multiply (GEMM) functions.
By Taylor Kidd, Intel Corporation
This article is essentially a collection of blogs I wrote on the same subject. The differences are simply a degree of formalism.