Using Intel® Math Kernel Library for Embedded

By Noah Clemons,

Published:07/17/2014   Last Updated:07/17/2014

This article describes how to use Intel® Math Kernel Library within the many embedded target environments that Intel® System Studio offers.

Brief overview of Intel® Math Kernel Library

Intel® Math Kernel Library (Intel® MKL), is a high performance library of mathematical functionality (linear algebra, FFT, mathematical functions, RNGs, various solvers and more) optimized for Intel® Architecture. For each platform Intel® MKL supports 32-bit and 64-bit applications, and uses only generic OS functionality to avoid complications in supporting certain flavors of the OSes. Intel® MKL can use multiple cores in the computational functions.

Intel® MKL supports three linking models: static, dynamic, and ‘mkl_rt’ (single dynamic library). It is not recommended to mix the the linking models. In each model, Intel® MKL needs at least three layers: interface, threading, and core. With ‘mkl_rt’ linking selection of the layers happens at run-time (hence the name), while static and dynamic linking define the layers at link time

Due to the layered structure of Intel® MKL and dependences between the layers, deriving the correct link line may be nontrivial, so Intel® MKL provides the MKL Link Line Advisor.

A simple rule, however, is this: take mandatory parts, maybe add optional parts, and wrap this in a linking group. For example (a static link line):

icc –o app.exe  … -Wl,--start-group,-Bstatic –lmkl_intel –lmkl_intel_thread –lmkl_core –liomp5 –Wl,-Bdynamic,--end-group –lpthread –lm

If linking fails due to unresolved symbols, because the symbols are not available on the target platform, these symbols can be provided by either the application or a stub library which source code is provided below.

Intel® Math Kernel Library vs. Intel® Integrated Performance Primitives (Intel® IPP)

The biggest use case for embedded applications is media processing primitives (sound processing, image/video manipulation, computer vision). This type of functionality is provided by Intel® Integrated Performance Primitives, not by Intel® MKL.

Apart from functional content, the following table shows the main usability distinctions of Intel® MKL from Intel® IPP for embedded applications

Usability feature

Intel® MKL

Intel® IPP

Size of statically linked executable. Many embedded applications need to have a small binary.

Large. With Intel® MKL you cannot link into your application with only one cpu branch (e.g. link in FFT optimized for SSE4.2 but not include generic SSE2).

Small. You can link in only target platform optimizations.

Threading.Harnessing cpu cores to solve tasks faster.

Intel® MKL may employ all cpu cores via internal parallelization. There is an  OpenMP runtime library dependency on the target.

Intel® IPP contains heavily vectorized single-core implementations.

Memory allocation. Embedded applications should be able to control memory used by applications: recognize and respond to shortage of memory gracefully, release memory sooner.

Intel® MKL allocates memory internally and it has its own fast memory allocator. An error is returned upon shortage of internal memory. In many cases the size of internal memory cannot be determined in advance.

Intel® IPP relies on user-allocated memory. There are functions avalable to query needed memory.

64-bit integers. With increasing embedded RAM size and cpu performance, larger problems may be processed with more than 4GB datasets. Also use of 64-bit integers matches register size and avoids redundant conversion instructions

Intel® MKL supports ILP64 interfaces. With Intel® MKL you can solve huge problems using 64-bit array indexing.

Intel® IPP uses 32-bit integers only.

Messages for embedded debugging. If a library reports an issue, the message should be human-readable, preferably localized.

Intel® MKL may occasionally print information on stdout or stderr. Message catalogues are supported (now en_US and ja_JP).

Intel® IPP does not print messages on stdout. A function to convert an error code into a human-readable message is provided. Message catalogues are supported (now en_US and ja_JP).

Runtime environment variables. All embedded execution environments support this, so embedded applications may use environment variables with no restrictions.


Intel® MKL may alter its behavior depending on environment variables, directly (e.g. MKL_DYNAMIC) or indirectly (e.g. OMP_NUM_THREADS).

Intel® IPP does not use and is not affected by environment variables.

Graceful error processing. Library shall not exit() in any case. On resource shortage library function shall return error code.

Intel® MKL may print a message and exit() if some resource is not available (e.g. a dynamic library cannot be loaded by dispatcher). Such situations are rare and are carefully documented.

Intel® IPP functions always return an error code and never call exit() or abort().

Thread Local Storage (TLS). An internal state of a library (e.g. accuracy mode), should be maintained per-thread in TLS variables.

Intel® MKL uses several global variables with Linux TLS conventions. This may cause problems on OS with incompatible TLS.

Intel® IPP does not use TLS variables.

File i/o. Many embedded devices provide a ‘disk storage’ in addition to RAM. A library may use it for temporary or persistent storage.

A few Intel® MKL functions need file i/o, namely out-of-core Pardiso.

Intel® IPP does not do file i/o.

Progress/interrupt/resume while computing large problems, an embedded application should be able to suspend execution and release resources, and later resume computation.

A few Intel® MKL functions call user-defined mkl_progress() function while doing a lengthy computation. In most cases, large computations cannot be suspended — granularization shall be done by application.

Not supported. In all cases granularization shall be done by application.


Intel® MKL embedded compatibility library

Intel® MKL for Linux uses some Linux Standard Base functionality that embedded OSes do not provide. As a result, an application cannot link Intel® MKL statically. The following table lists workarounds and proposed implementation of the missing functionality.

Missing symbols


Stub implementation


An alias for standard ‘strtol’ function.

#include <stdlib.h>

long __strtol_internal(char *nptr, char **endptr, int base, int group)
{ return strtol(nptr,endptr,base); }

catopen, catgets

Message catalogue functions. With this stub implementation, MKL will message in English regardless of LANG.

void *catopen(const char *name, int flag) { /*errno = EACCES*/; return (void*)(-1); }

void *catgets(void *catalog, int setno, int msgno, void *message)
{ return message; }


Reference to standard error stream.

#include <stdio.h>

#if defined(stderr)
#pragma push_macro("stderr")
#undef stderr

FILE *stderr =

#pragma pop_macro("stderr")



The implementation may be simply copy-pasted into the source code of your application to make it statically linkable with Intel® MKL. The stub implementation can be compiled for the target platform.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804