Managing Multi-core Performance

Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

Download PDF

ID 766690

Date 11/07/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-0902790E-A4E0-4C1F-BE78-176AD178E18B

View Details

Managing Multi-core Performance

You can obtain best performance on systems with multi-core processors by requiring thatthreads do not migrate from core to core. To do this, bind threads to the CPU cores bysetting an affinity mask to threads. Use one of the following options:

OpenMP facilities (if available), for example, theKMP_AFFINITYenvironment variable using the Intel OpenMP library
A system function, as explained below
Intel TBB facilities (if available), for example, the tbb::affinity_partitioner class (for details, see https://www.threadingbuildingblocks.org/documentation)

Consider the following performance issue:

The system has two sockets with two cores each, for a total of four cores (CPUs).
The application sets the number of OpenMP threads to two and calls Intel® oneAPI Math Kernel Library (oneMKL) to perform a Fourier transform. This call takes considerably different amounts of time from run to run.

To resolve this issue, before calling Intel® oneAPI Math Kernel Library (oneMKL), set an affinity mask for each OpenMP thread using theKMP_AFFINITY environment variable or the sched_setaffinity system function. The following code example shows how to resolve the issue by setting an affinity mask by operating system means using the Intel compiler. The code calls the functionsched_setaffinityto bind the threads tothecoreson different sockets. Then the Intel® oneAPI Math Kernel Library (oneMKL)FFT functionis called:

        
#define _GNU_SOURCE //for using the GNU CPU affinity
// (works with the appropriate kernel and glibc)
// Set affinity mask
#include <sched.h>
#include <stdio.h>
#include <unistd.h>
#include <omp.h>
int main(void) {
	int NCPUs = sysconf(_SC_NPROCESSORS_CONF);
	printf("Using thread affinity on %i NCPUs\n", NCPUs);
#pragma omp parallel default(shared)
	{
		cpu_set_t new_mask;
		cpu_set_t was_mask;
		int tid = omp_get_thread_num();
		
		CPU_ZERO(&new_mask);
		
		// 2 packages x 2 cores/pkg x 1 threads/core (4 total cores)
		CPU_SET(tid==0 ? 0 : 2, &new_mask);
		
		if (sched_getaffinity(0, sizeof(was_mask), &was_mask) == -1) {
			printf("Error: sched_getaffinity(%d, sizeof(was_mask), &was_mask)\n", tid);
		}
		if (sched_setaffinity(0, sizeof(new_mask), &new_mask) == -1) {
			printf("Error: sched_setaffinity(%d, sizeof(new_mask), &new_mask)\n", tid);
		}
		printf("tid=%d new_mask=%08X was_mask=%08X\n", tid,
						*(unsigned int*)(&new_mask), *(unsigned int*)(&was_mask));
	}
	// Call Intel MKL FFT function
	return 0;
}

Compile the application with the Intel compiler using the following command:

icx test_application.c -openmp

wheretest_application.cis the filename for the application.

Build the application. Run it in two threads, for example, by using the environment variable to set the number of threads:

env OMP_NUM_THREADS=2 ./a.out

See the Linux Programmer's Manual (in man pages format) for particulars of the sched_setaffinityfunction used in the above example.

Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

Parent topic: Improving Performance with Threading

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

Managing Multi-core Performance