No Cost Options for Intel Math Kernel Library (MKL), Support Yourself, Royalty-Free

Here is a guide to various ways to obtain the latest version of the Intel® Math Kernel Library (Intel® MKL) for free without access to Intel® Premier Support (get support by posting to the Intel Math Kernel Library forum). Anytime you want, the full suite of tools (Intel® Parallel Studio XE) with Intel® Premier Support and access to previous library versions can be purchased worldwide.

Multithreading (BLAS sgemm)

Dear Forum,

I am trying to make MKL accelerate a matrix multiplication for me. It works, but MKL insists on doing it with a single thread. I played around a bit. But regardless of what I do - even when multiplying two randomly initialized 10000x10000 matrices - MKL does not use multiple threads. Am I missing something?


BLAS sgemm, via


Environment settings:

Makefile and MKL


  I am using MKL (the student version) with MPICH2.In my Makefile, the paths for MKL are hardcoded. How can I make it that they get more general? I mean, that now that my professor will check the project, assuming he was MKL installed in his system, how can he compile it? I would like to provide a Makefile that would be (almost) ready to run.

source returns error

Plateform : Linux Fedora 22

Intel compilers_and_libraries_2016.0.109

Building R-3.2.2 from source with ICC

% source /opt/intel/bin/ intel64
get_library_directory:1: no matches found: s/^ //


R build fine when ignoring this error, but I wonder if I should indeed take care of it.

Thank you for help.

offload_transfer: array of variables?


I would like to pre-allocate a number of buffers for later data transfers from CPU to MIC, using explicit offloading in C++.

It works nicely if each buffer corresponds to an explicit variable name, as e.g. in the double-buffering examples. However, I would like to have a configurable number of such buffers (more than 2), i.e. an array of buffers. (the buffers are used for asynchronous processing on the MIC, and I need quite a few of them).

ippiConvFull_32f_C1R error in IPP7.0

Hi, I test ippiConvFull_32f_C1R under VS2010 and IPP 7.0 on my computer. The cpu

is i5-3470 CPU@3.20GHz. 

I find when the kernel size is larger than 10*10, then the result is not correct. the code as follows:

    int   nWidth = 81;
    int   nHeight = 80;
    float *pfsrc = new float[nWidth*nHeight];

    for(int i = 0; i < nWidth*nHeight; i++)
        pfsrc[i] = i;

    int nKWidth = 11;
    int nKHeight = 11;
    float psKernel[200];

Diagnostic 3180: unrecognized OpenMP #pragma

The test code below worked with Intel Composer 2013 but not with SP1 Update 5,
and gives me "Diagnostic 3180: unrecognized OpenMP #pragma".
Thanks in advance.]

// OpenMPTest.cpp : Defines the entry point for the console application.

#include "stdafx.h"
#include <map>
#include <omp.h>

int _tmain(int argc, _TCHAR* argv[])
	std::map<int, int> box;
	box[0] = 0;
	box[1] = 0;
	box[2] = 0;

#pragma omp parallel for
	for (auto iter = box.begin(); iter != box.end(); ++iter) {
		(*iter).second = rand();
	return 0;


Threading abonnieren