Getting started with MPI

Getting started with MPI

Hi,

Just getting started with MPI so we haven't even built a cluster yet; I am running things on my desktop.

I have a really basic question with the simple hello world program provided as an example. I added some print commands before and after the MPI_INIT and MPI_FINALIZE blocks and get this:

>mpiexec -n 4 test.exe

This is serial code before MPI

This is serial code before MPI

This is serial code before MPI

This is serial code before MPI

Hello world: rank 0 of 4 running on

lestrade-PC

Hello world: rank 1 of 4 running on

lestrade-PC

Hello world: rank 2 of 4 running on

lestrade-PC

Hello world: rank 3 of 4 running on

lestrade-PC

This is serial code after MPI

This is serial code after MPI

This is serial code after MPI

This is serial code after MPI

Why I am seeing multiple printing from the serial section of the code ? I thought only the section between MPI_INIT and MPI_FINALIZE was parallelized. If I have to put the MPI_INIT function at the very beginning of my program and then add if (rank.eq.0) .... eveywhere else to restrict all the initial setup work to the head node, this means a major overhaul of my source code which is not very convenient. I must be missing something obvious....

20 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Hi Michel,

Mpiexec just create as many processes as you request. They can be started on one node or on several nodes. But these are absolutely the same processes. Initialization of the MPI library happens in MPI_Init() function and only after that each process know its rank and how many ranks has been started.

Please, never insert code before MPI_Init() and after MPI_Finalize() - behavour of this code will be unpredictable.

Regards!
---Dmitry

OK, I guess I was expecting a behavior similar to OpenMP but it seems I am wrong.

Re-writing our entire codebase to avoid printing messages and doing all the other work that it does mutltiple times is problematic. I guess my only other option is to spawn a separate program with mpiexec and pass all relevant data from the main caller to the rank 0 process.

This wouldn't be so bad if the MPI job was "run once and exit" but for performance reason, there are some things we want the MPI code to do only once so we can skip some steps on further calls. This means the program would have to launch and stay running after the initial call, with data passing to and from the main serial caller multiple times.

Seems hard to do this with only text files or piping I/O between the two programs so I guess I will need a inter-process mutex. Can I use TBB to accomplish this ? I would like the code to be portable so I want to avoid calling the Microsoft API directly.

Hi,

Sorry for replying to my own post but I have lots of questions and problems getting up and running. Since I cannot run wmpiconfig (separate post), I am using mpiexec and the command line to define hosts. Using the test.f90 sample program, I am having some trouble understanding what I am seeing.

The test systems are both Win7 Pro (x64) using i7 chips. If I launch on one node, either local or remote, I get the following (correct) behavior:

C:\mumps_michel\hello>mpiexec -env I_MPI_DEBUG 5 -host Win7PC \\crosslight1\scra
tch\tmp\test.exe
[0] MPI startup(): shm data transfer mode
[0] MPI startup(): I_MPI_DEBUG=5
[0] MPI startup(): NUMBER_OF_PROCESSORS=8
[0] MPI startup(): PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 26 Stepping 5, Ge
nuineIntel
[0] MPI startup(): set domain to {0,1,2,3,4,5,6,7} on node Win7PC
[0] Rank Pid Node name Pin cpu
[0] 0 2356 Win7PC {0,1,2,3,4,5,6,7}
Hello world: rank 0 of 1 running on
Win7PC

But if I try to run two nodes at once, I get the following output:

C:\mumps_michel\hello>mpiexec -env I_MPI_DEBUG 5 -hosts 2 LESTRADE-PC 1 Win7PC 1
\\crosslight1\scratch\tmp\test.exe
[0] MPI startup(): tcp data transfer mode
[1] MPI startup(): tcp data transfer mode
[0] MPI startup(): I_MPI_DEBUG=5
[0] MPI startup(): NUMBER_OF_PROCESSORS=8
[0] MPI startup(): PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 26 Stepping 5, Ge
nuineIntel
[1] MPI startup(): set domain to {0,1,2,3,4,5,6,7} on node Win7PC
[0] MPI startup(): set domain to {0,1,2,3,4,5,6,7} on node lestrade-PC

then it hangs there for a few minutes and I have to Crtl-C to escape.

mpiexec and smpd have both been added to the exception list of the the Winows Firewall (on both machines) and wmpiregister has been used to define a comon account on both workstation. I also tried setting shm:tcp in I_MPI_FABRICS without success.

What else should I look for ?

Hi Michel,

Could you try to add '-env I_MPI_PLATFORM 0' to the command line?
Do you have any other versions of MPI Library on these nodes?

Regards!
Dmitry

Hi Dmitry,

Tried it, no change. I am beginning to suspect it is credential issue related to our lack of a domain (we use local accounts and workgroups on all our workstations).

Since I have several ongoing questions which are starting to look related, should I stick to one thread ?

Hi Michel,

Please be sure that you have added your account into Remote Desktop Users group on each machine.

Regards!
Dmitry

Hi Dmitry,

Since I am using Windows 7 machines and a workgroup without a domain, there doesn't appear to be Remote Desktop Users Group. Even with Win7Pro, you only have the simplified User Account interface wiuth a lot fewer settings than what I remember from XP.

Nevertheless, the account in question is an Administrator and I turned on the Remote Desktop for each machine: doing so had a mention that Administrators were already allowed to access Remote Desktop.

Despite that, the problem remains.

Is it a problem if the machine is already logged in to a different account ? I would have thought that you only needed to laucnh processes (i.e. "run as") and it is not immedately obvious why you would need to be able to access the Desktop itself ...

Hi Michel,

>Is it a problem if the machine is already logged in to a different account ?
That probably prevents you from logging in to the remote machine.

Win7 should have Remote Desktop user group. Please in Start-> Run enter "lusrmgr.msc" and press OK.
In the left collumn of the appeared window select "Goups" and in the middle column choose "Remote Desktop Users" right-click on this line and change Properties. You need to add Administrator to the group.

Regards!
Dmitry

Hi Michel,

Did you ever get this going?
I have a similar situation (except one machine is Win7 [my dev computer] and
the other WinXP [a test computer]) to you and hit those problems too.

Solution for me was to ensure the same user was defined on both machines;
install the MPI runtime on the WinXP test machine (I already had full MPI dev
kit installed on Win7 machine);
register the username (the same one) on both machines;
create mpiusr.txt with the logon identity to be used;
set up the same directory on both machines with the same files;
run the test using a little BAT file (not exe) - as per MPI validation test in
documentation.

Command looked like:
mpiexec -pwdfile mpiusr.txt -l -hosts 2 mine
3 theirs 2 -wdir C:\Devel\test a.bat

I can run the same command from the WinXP machine and get a similar result
(i.e. it works) but to-date, I've had to turn off the Win7 Windows Firewall because,
even though I've got entries for SMPD and MPIEXEC, the Firewall still blocks
them for some reason.

My problem is
I still cant get an exe to run in this environment on the remote computer. I
thought the default fabric should be TCP/IP but thats probably the next thing
to check. BTW, the exe runs from mpiexec provided I stay on the computer that
actually runs the mpiexec.

Cheers, Geoff

Hi Geoff,

You said that you are running on one Windows* 7 computer and one Windows* XP computer. That opens up another set of issues, as the Intel MPI Library is not supported on heterogeneous clusters.

Are the systems both 32-bit, 64-bit, or one of each? What about the executable you are trying to run? A 64-bit executable will not run on a 32-bit operating system.

Just to clarify, you are able to run the batch file across both systems, but nota binary executable. Is this correct?

Can you get the versions of the SMPD from each computer? Please run

smpd -get binary

on each of the systems and let me know what the output is from each.

What are the hostnames of the systems? Do they have a FQDN? Is there a DNS system that is giving them different names from what is on the system itself?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

Thanks for
your response.

OK, I
thought I had ensured everything was running 32-bit, but I discovered with smpd
get binary that the Win7 machine was running with the 64-bit version. Both
computers are now running

\Intel\MPI\4.0.3.009\ia32\bin\smpd.exe

Having fixed
that, still no gold star.

The Win7
machine _is_ running 64-bit O/S, but I have removed (I believe!) all the
references to em64t in the paths that matter.

I have built
the Hello (World) program with only 32-bit pieces which I believe I can confirm
are OK because that program runs successfully on the (32-bit only) WinXp
machine using the command:

mpiexec -l -exitcodes -pwdfile
mpiusr.txt hosts 1 geoff 2 hello.exe

Did I
mention that both computers give the error? They do.

e.g. on geoff,
running mpiexec -l -exitcodes -pwdfile mpiusr.txt -hosts 1 study 2 hello.exe

results in:

rank: node: exit code

0: study: -1073741515

1: study: -1073741515

Can't load dynamic library

Running the mirror
image command on study gives the same result.

Both
computers can run the exe on itself via mpiexec.

Yes, to
confirm, when I run a BAT file, it works across the network as well as the
local computer. i.e.:

mpiexec -l -exitcodes -pwdfile
mpiusr.txt -hosts 2 geoff 3 study 2 a.bat

works.

The systems
are running in a Workgroup, not Domain and their names are just geoff and study.
Since the BAT file works, I believe that these names are not the issue.

One more
observation is that I tried to link the Hello program statically, but when I
tried running it in a clean 32-bit Win7 environment (a VMware virtual
machine) it failed because it couldnt find impi.dll. Impi.dll is loaded on geoff
(via dev install) and study (via runtime install) so it should have been found
anyway, but I thought it should have been loaded in the exe in the first place!

Cheers, Geoff

Hi Geoff,

As a note, in Windows* the impi.dll library will always be linked dynamically (along with some basic system libraries) even if you specify static linkage. It is likely that the environment variables are not being set properly to find impi.dll from one system to another.This file's locationis different for the two computers. I do have a few suggestions.

Try not passing any environment variables by adding -envnone to your mpiexec call. This should allow the remote system to find its own MPI location, rather than what is specified on the local computer.

Try specifically passing a unique I_MPI_ROOT for each system.

mpiexec -n 1 -host geoff -env I_MPI_ROOT "C:Program Files (x86)IntelMPI4.0.3.009ia32bin...." hello.exe : -n 1 -host study -env I_MPI_ROOT "C:Program FilesIntelMPI4.0.3.009ia32bin...." hello.exe

This command should work from either computer, but if you want to simplify it, leave out the environment variable flag for whichever computer is the local computer. You may need to add another -env for your Path variable as well.

If these methods don't work, tryinstalling the Intel MPI Library in the same location on each computer. You will need to uninstall first in order to do this. Select a location that will be the same folder name on each computer.

Please try these suggestions and let me know what happens.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

Identifying the differences in PATH statements between 32- and 64-bit systems was the key. Thanks very much.

In testing your suggestions, though, I encountered an issue - the "-envnone" doesn't seem to work as specified. In executing on "geoff":
mpiexec -pwdfile mpiusr.txt -n 1 -host study -envnone a.bat
Where a.bat is:
@echo off
echo %computername%
echo %homepath%
cd
set
Some variables seemed to still get transferred (easily noticed because the "(x86)" stands out). I would need to do some more tests to be absolutely sure (because I realise there are still definitions on the remote machine that may look the same as on the local machine) but that was my initial observation.

Anyway, I have my program running across mpiexec, now, so thanks for your help :)

Cheers, Geoff

Some more info on "-envnone": the main issue I have with it is that it still transfers PATH rather than use the local machine's. It is alos the same with "-envexcl Path" :(

Geoff

Ah, but "-genvnone" works!
Happy little vegemite now :)
Geoff

Hi Geoff,

I'm glad you've gotten it working. The difference in -genvnone and -envnone is that with the g, it is a global option. This will apply it to all commands running under the current mpiexec. Without the g, it is a local, and only applied to the current one. If you had put -envnone in the command for the remote computer, it (likely) would have worked correctly. That was a mistake on my part, but I'm glad you were able to work around it.

As a reference for anyone else reading this thread, the correct command should have been:

mpiexec -n 1 -host geoff -env I_MPI_ROOT "C:Program Files (x86)IntelMPI4.0.3.009ia32bin...." hello.exe : -n 1 -host study -env I_MPI_ROOT "C:Program FilesIntelMPI4.0.3.009ia32bin...." hello.exe

Note also that the backslashes should be correct now.

Sincerely,
James Tullos
Technical Support Engineer
Intel Cluster Tools

Hi James,

Yeah I got the backslashes bit ;) (switching between Unix and Windows all the time can be a pain, right?)

What was not obvious from my earlier comment (maybe) was that -envnone failed / fails. Yes I know it's a local option; I used it in a local sense (well, I thought I did) and it didn't work. However, the saving grace was that -genvnone did, indeed, work.

What I also want to say is that once I did that, I didn't need any -env parameter because I_MPI_ROOT had already been set up on the remote computer because the runtime package had already been installed there. (It must be if you want to use MPI.) As soon as I was able to stop mpiexec from transferring the environment variables, in particular the PATH variable, from the local computer to the remote computer, all worked well.

Thanks for your assistance over the course of this thread.

Cheers, Geoff

Hi Geoff,

A few bits of clarification. The backslash issue is that in the post preview, a backslash is removed, but the actual post keeps it. The "local" vs. global for -envnone isn't local in the sense of the local computer, but for that particular process. Here's an example. Let's say we have a program called printvar.exe that prints out the value of the environment variable VAR. If VAR is unset and we run this command:

mpiexec -n 1 -env VAR hello printvar.exe : -n 2 -env VAR hi printvar.exe : -n 1 printvar.exe

The first copy will print "hello", the second and third will print "hi", and thefourth will print nothing. The "local" is that it refers only to the current process(es) in the list. If the command were instead:

mpiexec -genv VAR hello -n 1 printvar.exe : -n 2 printvar.exe : -n 1 printvar.exe

All of the processes would print "hello" as it is now "globally" set for all.

I hope that clarifies what I meant by local vs. global.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Yep, that's what I was expecting. Thanks James.

Cheers, Geoff

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!