ippsCopy Vs. ippiCopyManaged

This article explains when to use ippsCopy and ippiCopyManaged from Intel® IPP (Intel® Integrated Performance Primitives).

ippsCopy
is recommended to be used if the data fits into the L2 cache and copying will be very fast after the first time, but if it exceeds the L2 cache size, it will constantly be thrashing the whole cache and not recommended  for such cases.  ippiCopy function has a threshold when it starts to use non-temporal store and it is - (src_len + dst_len) >= L2.   Also, please note that if you compile your code with Intel®  C/C++  Compiler then for the memcpy or ippsCopy, the performance mostly will be the same since Intel compiler uses the same optimized kernel from ippsCopy.

For big memory transfers (larger than LLC size), in most cases memcpy or ippsCopy algorithm doesn't matter as its speed is limited by bus speed.  In such cases, it is recommended to use ippiCopyManaged.  If it is uncacheable, you can use this function with IPP_NONTEMPORAL_STORE parameter and if it is cacheable, you can use IPP_TEMPORAL_COPY parameter to force non-temporal stores.

For more complete information about compiler optimizations, see our Optimization Notice.