MPI locked to specific core on node

MPI locked to specific core on node


Hi all,

I am running a multiple core single node machine. I wish to run several instances of a mpi processes (say 8 cores for each mpi process) but without the use of a scheduler. Is this possible, i.e. will the mpi process stick to the assigned core or will I need a scheduler to assign the cores to the required tasks?

Thanks,

 

 

10 posts / 0 new

Hi Hob,

Yes, you can launch MPI processes without a job scheduler.

We want to know more details about what you were trying to achieve here.

The IMPI allocates the CPUs based on the number of ranks launched

ex: if you have 80 core CPU and launch 10 processes each rank will be allocated 8 cores.

sdp@sdp:~/prasanth/mpi$ cpuinfo -g

=====  Processor composition  =====
Processor name    : Intel(R) Xeon(R) Gold 6148
Packages(sockets) : 2
Cores             : 40
Processors(CPUs)  : 80
Cores per package : 20
Threads per core  : 2

sdp@sdp:~/prasanth/mpi$ I_MPI_DEBUG=5 mpirun  -n 10 ./test
[0] MPI startup(): libfabric version: 1.10.0a1-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       70069    sdp        {0,1,2,3,40,41,42,43}
[0] MPI startup(): 1       70070    sdp        {4,5,6,7,44,45,46,47}
[0] MPI startup(): 2       70071    sdp        {8,9,10,11,48,49,50,51}
[0] MPI startup(): 3       70072    sdp        {12,13,14,15,52,53,54,55}
[0] MPI startup(): 4       70073    sdp        {16,17,18,19,56,57,58,59}
[0] MPI startup(): 5       70074    sdp        {20,21,22,23,60,61,62,63}
[0] MPI startup(): 6       70075    sdp        {24,25,26,27,64,65,66,67}
[0] MPI startup(): 7       70076    sdp        {28,29,30,31,68,69,70,71}
[0] MPI startup(): 8       70077    sdp        {32,33,34,35,72,73,74,75}
[0] MPI startup(): 9       70078    sdp        {36,37,38,39,76,77,78,79}

Else do you want to bind 8 cores for each process? 

Can you give an example of the scenario you want? This will help us in understanding the problem better.

 

Regards

Prasanth


Quote:

Dwadasi, Prasanth (Intel) wrote:

Hi Hob,

Yes, you can launch MPI processes without a job scheduler.

We want to know more details about what you were trying to achieve here.

The IMPI allocates the CPUs based on the number of ranks launched

ex: if you have 80 core CPU and launch 10 processes each rank will be allocated 8 cores.

sdp@sdp:~/prasanth/mpi$ cpuinfo -g

=====  Processor composition  =====
Processor name    : Intel(R) Xeon(R) Gold 6148
Packages(sockets) : 2
Cores             : 40
Processors(CPUs)  : 80
Cores per package : 20
Threads per core  : 2

sdp@sdp:~/prasanth/mpi$ I_MPI_DEBUG=5 mpirun  -n 10 ./test
[0] MPI startup(): libfabric version: 1.10.0a1-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       70069    sdp        {0,1,2,3,40,41,42,43}
[0] MPI startup(): 1       70070    sdp        {4,5,6,7,44,45,46,47}
[0] MPI startup(): 2       70071    sdp        {8,9,10,11,48,49,50,51}
[0] MPI startup(): 3       70072    sdp        {12,13,14,15,52,53,54,55}
[0] MPI startup(): 4       70073    sdp        {16,17,18,19,56,57,58,59}
[0] MPI startup(): 5       70074    sdp        {20,21,22,23,60,61,62,63}
[0] MPI startup(): 6       70075    sdp        {24,25,26,27,64,65,66,67}
[0] MPI startup(): 7       70076    sdp        {28,29,30,31,68,69,70,71}
[0] MPI startup(): 8       70077    sdp        {32,33,34,35,72,73,74,75}
[0] MPI startup(): 9       70078    sdp        {36,37,38,39,76,77,78,79}

Else do you want to bind 8 cores for each process? 

Can you give an example of the scenario you want? This will help us in understanding the problem better.

 

Regards

Prasanth

 

Hi Prasanth,

 

thanks for the reply, basically I have software that runs on mpi that is launched via command line as:

 

mpiexec -n 8 prog prog.extension

So it launches 8 cores (as an example) and begins running at 100% usage on eight cores.

 

Some time later I would like to run another 8 cores for a different simulation and again launch via mpiexec, the machine has 32 cores so two instances eachs requriing 8 cores should not be an issue (16 cores total).

My question is how the load across the cores will be managed and if they mpiexec process will be dedicated on the cores, i.e. it will not try to use a core on the already launched program that is at 100% usage? I want ot make sure there is no cross talk between two instances launched on the same computer,

 

Thanks,


On a Linux system, I would start with either "taskset" or "numactl" to provide an initial core binding for the mpiexec command (which should be inherited by its children).   E.g., 

taskset -c 8-15 mpiexec -n 8 FirstProgram
taskset -c 16-23 mpiexec -n 8 SecondProgram
taskset -c 24-31 mpiexec -n 8 ThirdProgram

You can use I_MPI_DEBUG=5 to check the bindings to see if this works....

"Dr. Bandwidth"

Quote:

McCalpin, John (Blackbelt) wrote:

On a Linux system, I would start with either "taskset" or "numactl" to provide an initial core binding for the mpiexec command (which should be inherited by its children).   E.g., 

taskset -c 8-15 mpiexec -n 8 FirstProgram
taskset -c 16-23 mpiexec -n 8 SecondProgram
taskset -c 24-31 mpiexec -n 8 ThirdProgram

You can use I_MPI_DEBUG=5 to check the bindings to see if this works....

Hi John,

Thanks for that (for some reason my other account will not log in), I am running on windows 10, I'm not sure if the process allocation is held specifically by the OS, there are options for processor affinity and I was trying with powershell to specify but I'm not sure if this would work.

It may be that I have to use something like windows HPC or PBS in order to allocate resources (cpu range),

Regards,


The environment variables for process pinning of Intel MPI are explained in the Intel MPI Reference Guide at https://software.intel.com/content/www/us/en/develop/documentation/mpi-d...

If you are not(!) running a hybrid MPI/OpenMP (or other threads per MPI process) the following easy approach using I_MPI_PIN_PROCESSOR_LIST should work. As an example I ran the benchmark IMB-MPI1 (included in the Intel MPI distribution) on a 4 cores laptop.

A call of "cpuinfo" (part of Intel MPI) shows the cores/hyperthreads layout and numbering. In parentheses the hyperthreads on a core are identified:

cpuinfo
Intel(R) processor family information utility, Version 2019 Update 7 Build 20200312 (id: 5dc2dd3e9)
Copyright (C) 2005-2020 Intel Corporation.  All rights reserved.

=====  Processor composition  =====
Processor name    : Intel(R) Core(TM) i5-8350U
Packages(sockets) : 1
Cores             : 4
Processors(CPUs)  : 8
Cores per package : 4
Threads per core  : 2

=====  Processor identification  =====
Processor       Thread Id.      Core Id.        Package Id.
0               0               0               0
1               1               0               0
2               0               1               0
3               1               1               0
4               0               2               0
5               1               2               0
6               0               3               0
7               1               3               0
=====  Placement on packages  =====
Package Id.     Core Id.        Processors
0               0,1,2,3         (0,1)(2,3)(4,5)(6,7)

=====  Cache sharing  =====
Cache   Size            Processors
L1      32  KB          (0,1)(2,3)(4,5)(6,7)
L2      256 KB          (0,1)(2,3)(4,5)(6,7)
L3      6   MB          (0,1,2,3,4,5,6,7)

 

To execute two non-hybrid MPI applications in parallel, in a first window 2 MPI processes are started on the 1st hyperthreads of the first two cores, i.e. hyperthreads 0,2:

set I_MPI_DEBUG=5
mpiexec -env I_MPI_PIN_PROCESSOR_LIST 0,2 -n 2 IMB-MPI1

(Alternative:
set I_MPI_PIN_PROCESSOR_LIST=0,2
mpiexec -n 2 IMB-MPI1)

In a second window 2 MPI processes are started on the 1st hyperthreads of the third and fourth core, i.e. hyperthreads 4,6:

set I_MPI_DEBUG=5
mpiexec -env I_MPI_PIN_PROCESSOR_LIST 4,6 -n 2 IMB-MPI1

Both runs execute in parallel. The Intel MPI ranks to cores mapping is shown at the beginning of the debug output, see column "Pin cpu":

mpiexec -env I_MPI_PIN_PROCESSOR_LIST 0,2 -n 2 IMB-MPI1
[0] MPI startup(): Rank    Pid      Node name      Pin cpu
[0] MPI startup(): 0       54184    xxxxxxxx-MOBL  0
[0] MPI startup(): 1       26992    xxxxxxxx-MOBL  2

mmpiexec -env I_MPI_PIN_PROCESSOR_LIST 4,6 -n 2 IMB-MPI1
[0] MPI startup(): Rank    Pid      Node name      Pin cpu
[0] MPI startup(): 0       41268    xxxxxxxx-MOBL  4
[0] MPI startup(): 1       45460    xxxxxxxx-MOBL  6

In case of a hybrid MPI/OpenMP code you have to specify domains using I_MPI_PIN_DOMAIN (instead of I_MPI_PIN_PROCESSOR_LIST). Exactly one MPI process is started per domain, the rest of the hyperthreads in a domain is used for the threads of that MPI process (NB: Pinning of threads have to be done by other means!). For the first MPI run the specification is quite easy:

mpiexec -env I_MPI_PIN_DOMAIN core -n 2 IMB-MPI1
[0] MPI startup(): Rank    Pid      Node name      Pin cpu
[0] MPI startup(): 0       48760    xxxxxxxx-MOBL  {0,1}
[0] MPI startup(): 1       33752    xxxxxxxx-MOBL  {2,3}

There is no easy specification of domain shifts available. Therefore explicit domain masks (see reference guide) have to be used for the second run. For the 1st MPI process of this run hyperthreads 4,5 will be used. Setting the bits in the bitmask gives a hexadecimal value of 2^4+2^5=48=0x30. For the 2nd MPI process of this run hyperthreads 6,7 will be used. The corresponding bitmask evaluate to the hexadecimal value 2^6+2^7=192=0xC0. Therefore the domain mask is [30,C0]:

mpiexec -env I_MPI_PIN_DOMAIN [30,C0] -n 2 IMB-MPI1
[0] MPI startup(): Rank    Pid      Node name      Pin cpu
[0] MPI startup(): 0       39360    xxxxxxxx-MOBL  {4,5}
[0] MPI startup(): 1       27728    xxxxxxxx-MOBL  {6,7}

Final note: You can also use the I_MPI_PIN_DOMAIN approach instead of I_MPI_PIN_PROCESSOR_LIST for non-hybrid applications. Then the OS might move an MPI process between the hyperthreads of its domain (until an explicit thread pinning is defined for this single thread!).
 


Quote:

Klaus-Dieter Oertel (Intel) wrote:

The environment variables for process pinning of Intel MPI are explained in the Intel MPI Reference Guide

Hi Klaus,

Thanks, this is infact the way to do it on windows!

Unfortunately I thought I would need a task scheduler (windows HPC pack) and went the server 2019 route on the pc, not that I cannot revert to windows 10 but the HPC scheduler is quite nice.

One question I now have is, is there a way to exclude a specific core in a global setting or varaible somewhere? Ideally I want the HPC scheduler to automatically allocate cores without having to specify I_MPI_PIN_PROCESSOR_LIST for each mpiexec job.

The problem at the moment is, the mpiexec are assinged to dedicated cores (within HPC pack job scheduler), however, they are assigned to say core 0 to core 4 (-n 4), but the scheduler operates on core 0 and so cannot accept new requests due to the mpiexec task using the core.

I was wondering if there was an enviromental variable or something that would specify the core range for the mpi process to use?

Thanks for the help though, as a fall back I_MPI_PIN_PROCESSOR_LIST is pretty much what I need to use,


On Linux there would be I_MPI_PIN_PROCESSOR_EXCLUDE_LIST, however this variable is not available on Windows.


Quote:

Klaus-Dieter Oertel (Intel) wrote:

On Linux there would be I_MPI_PIN_PROCESSOR_EXCLUDE_LIST, however this variable is not available on Windows.

Hi Klaus, thanks for that,

I actually found a way to do it using windows HPC scheduler, if I take back affinity assigning to the HPC scheduler instead of the mpi process and upon first boot I assign a dummy process to run on 1 core, this gets assigned to core 0 (it is sequential scheduling within windows), from that point on the core range 1-31 is auto assigned by the scheduler. Since it is a dummy job with no completion or exit code, it uses also no cpu and therefore allows the OS/scheduler to continue to distribute the 1-31 cores when required.


Hi Jason,

Is your issue/query got resolved?

If yes please confirm so we can close the case.

Thanks

Prasanth

Leave a Comment

Please sign in to add a comment. Not a member? Join today