Use THP enabled by default in the MPSS Operating System:
MPSS versions later than 2.1.4982-15 support “Transparent Huge Pages (THP)” which automatically promotes 4K pages to 2MB pages for stack and heap allocated data. This means that for static and dynamic data, 4KB pages get automatically converted by the uOS to 2MB pages if they have a contiguous data access pattern. You can find more details here: https://software.intel.com/en-us/blogs/2013/07/09/transparent-huge-pages-on-intel-xeon-phi-coprocessors
“Transparent huge pages” is a Linux kernel feature introduced in kernel version 2.6.38. The external link http://lwn.net/Articles/423584/ gives the general picture about how Linux allocates useful huge pages without starving the application as to the number of available pages.
User programs can use mmap with special arguments to allocate data directly in 2MB pages
User programs can directly allocate dynamic data in 2MB pages using the mmap system call (with special arguments) instead of malloc/new. This may be useful if the data access pattern is such that the program can still benefit from allocating data in 2MB pages even though THP may not get triggered in the uOS. The following macros show how to get 2MB pages using mmap:
#include <sys/mman.h> #define my_malloc(size) \ mmap(NULL, size, PROT_READ | PROT_WRITE, \ MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, 0, 0); #define my_free(addr,size) munmap(addr, size);
Use library solutions such as libhugetlbfs
Another alternative is to use a library such as libhugetlbfs to automatically allocate all malloc-ed data and static data in 2MB pages (also works for Fortran). Refer to the article Optimizing Memory Bandwidth on Stream Triad for more information and tips on how to use libhugetlbfs.
Huge Pages in offload programs
In offload programs, THP automatic promotion applies to static data (defined on the Intel® Xeon Phi™ coprocessor) or for dynamic data that is allocated inside an offload region using a malloc or new call.
For data allocated by #pragma offload for pointer variables in in/out/nocopy clauses, THP does not apply. You can use the environment variable MIC_USE_2MB_BUFFERS (on the host) to set a threshold size beyond which allocation is done in 2MB pages. See the article Effective Use of the Intel Compiler's Offload Features for more details.
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™ coprocessors. The paths provided in this guide reflect the steps necessary to get best possible application performance.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804