IPP Crypto Sample Performance for OpenSSL too Slow on Hyper-Threading Systems

Problem:

When running the Intel IPP crypto sample for OpenSSL (ipp-samples/cryptography/openssl-ipp) on Intel® Hyper-Threading (Intel HT Technology) processors, users may find the AES benchmark application reports degraded performance when compared to the non-IPP version of OpenSSL.

Cause:

This is caused by conflicting OpenMP thread model settings for Intel HT Technology systems.

The Intel IPP crypto sample for OpenSSL links with the dynamic Intel IPP libraries, by default, which enables the use of internal threading (OpenMP) in the threaded Intel IPP AES functions. To overcome this problem, which is associated with how the OpenMP threading mechanism allocates threads within the IPP library, one must adjust the number of available threads and specify the threading affinity model.

Intel HT is most effective when each thread that is sharing an HT-enabled core is performing different types of operations and processor resources on the core are under-utilized. The threaded AES functions in the Intel IPP library execute at high efficiency, consuming most of the available processor resources and performing identical operations on each thread. Thus, these multi-threaded functions generally do not fare well if they are allocated to run side-by-side on a single HT core. It is better to have these threads run on separate cores.

To accommodate this need for thread isolation one must insure that the number of threads available to the OpenMP threading mechanism is equal to the number of cores, not the number of logical threads. (The default number of logical threads equals 2x the number of cores on an Intel HT processor or 1x the number of cores on a non-HT processor.) Moreover, when running the Intel IPP crypto sample on an Intel HT system, the thread scheduler must also be told to assign each crypto thread to a single core, ignoring the second HT thread available on each core. The net effect of applying these two guidelines will direct OpenMP to limit threading to the number of available cores and to spread those threads evenly over those cores.

Resolution:

There are several avenues available to address this issue. There is no “best” or “worst” solution, the best solution depends on your application and system needs.

If the multi-core processor on your system includes Intel HT (each core supports two hardware threads), for example an Intel Core i7 or Westmere processor, you should use one of the following solutions:

  1. disable multi-threading by linking with the static single-threaded version of the Intel IPP library
  2. disable multi-threading within the multi-threaded Intel IPP libraries by calling ippSetNumThreads(1)
  3. disable Intel HT Technology on your system (usually done via a configuration switch in the BIOS)
  4. configure OpenMP to use 1/2 of the available logical threads
    and set the KMP_AFFINITY environment variable as follows: KMP_AFFINITY=granularity=fine, compact,1,0

Note that the Intel IPP library overrides the OMP_NUM_THREADS environment variable, so you must use the following technique within your application to set the number of available logical threads to 1/2 the number of hardware threads:

ippGetNumThreads( &numThreads ) ;
ippSetNumThreads( numThreads/2 ) ;

See the IPP library documentation for more information regarding the above functions.

If the multi-core processor on your system does not include Intel HT (each core only supports a single hardware thread), for example an Intel Core 2 Duo or Core 2 Quad processor, you do not need to do anything. The IPP library will automatically configure OpenMP to utilize no more hardware threads than you have available on your system. You may still want to set the OpenMP affinity to a value that will optimize the IPP library for best operation on your system by setting the KMP_AFFINITY environment variable as follows:

KMP_AFFINITY=compact
or
KMP_AFFINITY=granularity=fine,compact,0,0

This will configure OpenMP to more readily exploit a shared cache, and is especially important on those processors that have multiple physical dies within a single package or those systems that have multiple sockets on the motherboard (which results in a dedicated cache per core).

More Information:

For more information on how these OpenMP directives work please see the Intel Compiler documentation pages. The section on the Thread Affinity Interface, in particular, should be helpful.

See this Intel Developer Zone blog entry for more information about OpenMP and the IPP library.

A new Intel IPP interface to address thread affinity is being considered for inclusion in future releases of the Intel IPP library.

To get details about the specific processor on your system, such as how many cores it contains and whether or not those cores support Intel HT, you can compile and run the CPU Information Utility found in the IPP samples (advanced-usage\cpuinfo) or visit ark.intel.com and locate your specific processor in this on-line database.

How to Download the Cryptography Library Add-on for the Intel IPP Library

The cryptography component of the IPP library is subject to US Export Administration Regulations and other US laws. To obtain the Intel IPP cryptography libraries, which must be downloaded separately, register for eligibility and follow the instructions you receive in the registration email. If you have additional questions review this knowledge base article on how to download the cryptography library component of the IPP library.

You must have a valid Intel IPP license key to install and use the Intel IPP libraries.

Optimization Notice in English

Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Возможность комментирования русскоязычного контента была отключена. Узнать подробнее.