KMP_AFFINITY=bunged_up

KMP_AFFINITY=bunged_up

My system is Linux CentOS 3.6, Parallel Studio XE 2013

Compiling for MIC KMP_AFFINITY=compact and KMP_AFFINITY=scatter works as expected (I have Xeon Phi). Threads are pinned properly.

Compiling for host processor E5-2620 V2 things are different.

KMP_AFFINITY=scatter, with OMP_NUM_THREADS=12 will at times distribute to all 12 hardware threads but not in the expected order.
*** most of the time, different OpenMP threads get assigned to the same hardware thread.

On subsequent runs the thread placement is arbitrary, and occasionally some different sw threads get assigned to the same hw thread.

It is behaving as if pinning were not occurring.

Using KMP=compact, with OMP_NUM_THREADS=12, most of the time, if not always (don't trust it), will assign the even/odd hw threads of each core to even/odd pairs of OpenMP threads, however the evenness and oddness of which changes

OpenMP APIC
0, 1 (note swap of even/odd her)
1, 0
2, 3
3, 0
4, 5
5, 4
6, 6 (note even/even)
7, 7 (note odd/odd)
8, 8
9, 9
10, 10
11, 11

Note, I have performed "sudo yum install schedutils" (it says already installed)

Any hints to fix the affinity issue on host processor under CentOS 3.6 would be appreciated.

Jim Dempsey

www.quickthreadprogramming.com
4 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

After further investigation it was determined that KMP_AFFINITY=compact and KMP_AFFINITY=scatter bind the threads to the logical processors sharing the L1 cache. My logical processor association code (enumeration of placement) was written under the assumption that there would be a one-to-one placement across logical processors as opposed to the two-to-two placement as observed. Note, different OpenMP libraries may bind differently.

With this knowledge, I can now change my placement enumeration code to permit sets, and then adapt the tuning parameters accordingly.

I thought I would pass this on to a) keep you from spending any unnecessary time on this, b) to pass on the information regarding two-to-two placement.

Jim Dempsey

www.quickthreadprogramming.com

I believe the 2 logical processors on a given core are treated as interchangeable.  You could check that in the "verbose" echos.  Maybe that's what you meant in your later post.

I don't know how you could have any expectations of CentOS 3.6 (if there is such an old OS) on a current CPU.  Maybe 6.3?   Even under CentOS 6.3, I think you would need newer OpenMP libraries than CentOS provides (e.g. Intel 13.1 libiomp5 or gcc 4.7 or newer libgomp) in order to support OpenMP 3.1 names for affinity setting (OMP_PROC_BIND.....)

Thanks for your response Tim. You are right about the 3.6. The actual version is 6.4 (I wrote the message in haste without checking).

I did use the verbose after I posted the first message, and that was the reason for the quick response of the second posting. The verbose pointed out that the bindings were to core (or L1, which is the same on the E5-2620 V2). Though I know I can use OMP_PROC_BIND=..., the routine I am optimizing may be a part of an application that the user finds performs better with a different OMP_PROC_BIND=..., or KMP_AFFINITY=..., or other setup. The code I am optimizing uses CPUID to survey the OpenMP thread placement within the system such that I can tune for optimal L1 cache hit ratios. The nice diagram in the Users Guide illustrates granularity of fine/thread as opposed to granularity of core. It is my fault for not reading the complete document to notice that default granularity is core, however, it would not hurt to modify the diagram to illustrate the impact default setting. And place this annotation immediately following or within the diagram.

Jim Dempsey

www.quickthreadprogramming.com

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui