Memory Allocation and First-Touch

Compiler Methodology for Intel® MIC Architecture

Memory Allocation and First-Touch

Memory allocation is expensive on the coprocessor compared to the Intel® Xeon processor so it is prudent to reuse already-allocated memory wherever possible. For example, if a function gets called repeatedly (say inside a loop), and this function uses an array for temporary storage, try to allocate the array (of maximum size needed) the first time and reuse that array in later calls:

static real *temp_array=0;

void foo(..) {
if (temp_array == 0) {
    temp_array = my_malloc(MAX_SIZE);
... // use of temp_array

Also, keep in mind that the physical memory allocation on Linux happens at the first touch (and not at the malloc-point). So, if you have a loop that traverses a previously malloced (but untouched) array, the first iteration may take a longer time than the rest.

Take Aways

Memory reuse is important for good performance on Intel MIC Architecture. Be mindful of how temporary arrays are allocated and used in your code.


It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™Coprocessors.  The paths provided in this guide reflect the steps necessary to get best possible application performance.

Back to Advanced MIC Optimizations chapter

For more complete information about compiler optimizations, see our Optimization Notice.


Vladimir G.'s picture

Does the recommendation also hold for small chunks of memory, if between (n)th and (n+1)st usage the allocated piece of 128-1024 bytes is relegated from cache to memory?

Nikita S.'s picture

I had some problems with it, but it helped. Thanks.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.