AES-NI performance degraded on SMP, Linux

AES-NI performance degraded on SMP, Linux

Bild des Benutzers Joong
Hi, I am a Intel instruction newbie, and I encountered a wired situation.

I'm developing a kernel module for VPN, and I decided to integrate AES-NI for AES cipher. The platform is Linux with non-preemtible SMP kernel.
I writed a assembly code almost same as the code from AES-NI Sample Library prorivded by Intel. Also I figured out I had to call kernel_fpu_begin() and kernel_fpu_end() to use AES-NI in kernel module. The module I wrote operated well as the perspective of encryption/decryption operation. Here is the Question. I tested the module with set cpu affinity. I confirmed the performance was improved as I used more cpu cores until all cores used were in same CPU. However, I use more cores including the cores in another CPU (totally I use two CPUs), the performance were degraded. How can I explain this situation? More cores, less performance?? My device has two CPUs (Intel Xeon CPU E5645). Please give me an answer or any suggestion. Thanks in advance. Summary: AES-NI in kernel module. Non-preemptive SMP kernel. With one CPU, looks good. With two CPUs, worse performance than one CPU.

3 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers Max Locktyukhin (Intel)

one is certain, it is not AES-NI to blame - look for higher level issues, inter-process synchronization etc.

Bild des Benutzers Adrian Hoban (Intel)

I agree with Max that AES New Instructions are not likely to be the root cause of the issue. Without seeing the actual implementation in question, the following list contains some areas for consideration:

  • Ensure you are not confusing the use of kernel_fpu_begin/end primitives with the use of SMP safe synchronisation primitives such as spin_lock_.... You may need both.
  • When moving to a NUMA system, it is best if your BIOS is configured for a NUMA system, your memory management code is NUMA aware and that you use a NUMA aware Network Interface Card driver.
  • Reduce the amount of locking contention on global variables. Consider refactoring code to using more per-cpu variables with appropriate levels of pre-emption & SMP protection.
  • Take note of the mapping of logical cores in Linux to physical cores/packages. This mapping could be different to what you may expect between the single and dual processor configurations.

To help with debugging it is worth trying the following:

  • Confirm that the remote side of the VPN tunnel is not silently dropping corrupted packets. This could be another pointer to an implementation issue on the transmitting side.
  • Try checking the multi-core/processor scaling ability with a regular C code implementation.
  • Consider using the Linux kernel crypto implementations.

Melden Sie sich an, um einen Kommentar zu hinterlassen.