In Intel MKL, for the automatic offloaded Level 3 BLAS functions (?GEMM, ?TRMM, ?TRSM), the computation can be divided among host CPU and Xeon Phi coprocessors by either using an environment variable or by calling a function and allow users to override the default work division decided by Intel MKL runtime.
The table below gives a few examples showing how to set and manage the division of work between the host and coprocessor(s).
