Problem with mpdboot

Problem with mpdboot

Hi everyone,

I'm trying to get the intel MPI library to work on a cluster with 16 nodes. I'm following the instructions as outlined in the "Getting_Started.pdf" file under the "Setting up MPD Daemons" section. I'm at the point where I am supposed to start the MPD daemon with mpdboot. I use the command:

mpdboot -v -n 16 -r ssh -f .mpd.hosts

Things start to boot properly, but then I get an error message saying that the syntax of the mpdboot.py file is incorrect:

mpdboot_rank_0 (mpdboot 256): starting local mpd on cluster2
mpdboot_rank_0 (mpdboot 308): starting remote mpd on c2n2
mpdboot_rank_0 (mpdboot 322): starting remote mpd on c2n3
File "/opt/intel_mpi_10/bin/mpdboot.py", line 84
argidx += 2
^
SyntaxError: invalid syntax
File "/opt/intel_mpi_10/bin/mpdboot.py", line 84
argidx += 2
^
SyntaxError: invalid syntax

Does anyone know what's wrong? Also, is there any way to use LAM (as in lamboot, lamstart, etc) instead of MPD?

Thanks,

Alexis

14 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Hi Alexis,

Using lambootinstead ofmpdboot isn't going to solve the problem. The errorcould be due to an incompatible Python version. Please submit this issue to Intel Premier Support.

What happens if you just type 'mpdboot' on the local host? Do you get the same error or does 'mpdtrace' show an mpd daemon running?

Best regards,

Henry

Hi Henry,

Thanks for the reply. It seems that the problem is due to the fact that on the node 0 of my cluster, python2.2 is properly installed, whereas on all of the other nodes, I just have python 1.5. I am going to try to install an updated python on the other nodes to fix the problem. Is there any way, though, that I can get the other nodes to use the python on the main machine?

On a seperate note, is there any way to get lamboot to work with intel mpi? I can boot all my nodes with lamboot, and run mpi with mpirun when I test with a simple c driver program. However, when I try to run something that I compiled with mpiifort with mpirun, I get an error message that there is no mpd running on this host. Is there a flag that I can use (when compiling maybe?) so that the code generated by mpiifort looks for the lam topology?

I hope that this makes some sense, I really don't know anything yet about writing/running parallel code!

Thanks,

Alexis

Hi Alexis,
You could export the python directory from the main node to all other nodes. Some clusters are configured to export a software directory on the main node to all other nodes. It's more convenient because software only has to be installed in one place. However, it increases network traffic to the main node. In a large cluster, this could become a performance bottleneck.

The LAM MPI and Intel MPI startup daemons are not the same. Substituting lamboot for mpdboot is not going to work. Sorry.

Best regards,
Henry

Hi everyone,

I finally got python working on all my nodes, and things seem to be working fine now, except for one thing. The cluster is composed of 16 dual processor machines, so in effect there are 32 processor available. Each node has a name like c2cX, where the X is between 2 and 15. The first node is called cluster2. So here are the names:

cluster2
c2n2
c2n3
c2n4
...
c2n15

The problem is that each node had 2 processors, but I don't know how to boot both of them up using mpd. With lam, I just repeat the names of the each node twice in the host list like so:

cluster2
cluster2
c2n2
c2n2
etc.

and when I boot using this file with lamboot, it knows to use two cpus for each node. However, when I try to do the equivalent with mpdboot, it tells me that "there are not enough hosts on which to start all processes".

So my question is: how would I go about booting both cpu's on each node? Similarly, these are Xeon boxes with hyperthreading. How would I boot 2 virtual cpus * 2 real cpus per node (thus having 4 effective cpus per node?).

Thanks!

imagem de Clay Breshears (Intel)

Alexis -

What results are you getting with mpdtrace after you boot the daemons on the system? Do all of the nodes get included?

You should only need to have one daemon on each node of the cluster and the daemon should be able to determine that there are two processors on each node. However, if you're not getting all 16 nodes covered, you might be seeing that message.

If you are getting all daemons started, look into the Intel MPI documentation on how to target nodes with a specific number of processes and the application that should be run on those processes. Try starting your job that way. Use a configuration file, otherwise you'll have a lot of command line typing to launch processes on 16 nodes.

If you're still getting the error message, you should report your problem to Intel Premier Support.

--clay

Hi

I have a similar problem. I install Intel_MPI on a dual processor dual core machine. In the other hand my machine has 4 processor. However, when I try do with mpdboot, it tells me that "there are not enough hosts on which to start all processes"

How would I go about booting all cpu's on my machine?

Thanks

mpdboot starts just one copy of mpd per node, taking the number of nodes you specify from the list in your mpd.hosts file. mpdboot doesn't set a limit on how many cpus will be used by mpiexec.

Hi,

Thanks for your reply.

I try to run example test.f in test directory of mpi.

My mpd.hosts is

localhost

localhost

localhost

localhost

but when I try boot using this file with mpdboot,"mpdboot -n 4", it tells me that:

"totalnum=4 numhosts=1"

"there are not enough hosts on which to start all processes"

and when I run "mpirun -n 2 a.out" it tells me:

totalnum=2 numhosts=1

there are not enough hosts on which to start all processes

What is wrong?

mpdallexit should clean up the mess left by botched mpi commands.
Apparently, you have just the one node visible, so only one copy of mpd could be started. Why not try to get the mpdboot right first; then mpiexec should work.
mpirun combines mpdboot and mpiexec. It may be confused by listing a node more than once in mpd.hosts.
Can you ping localhost ? On one of my machines, the only working entry in mpd.hosts is the current IP address, so I am at the mercy of the people running the LAN, as well as requiring a working ethernet driver. An active ethernet connection appears to be required, even if you don't try to simulate multiple nodes on a single node.

Hi,

I have one node but my node is a dual processor dual core machine, however I have 4 processor on one node.

I can ping and ssh localhost. When I run "mpdboot -n 1" it correctly work, but when I use an other number for example "mpdboot -n 2 or 3" it tells me that:

"there are not enough hosts on which to start all processes"

Is it possible to simulate multiple node on single node with Intel_MPI? How?

Thanks

Hi everyone

mpdboot, start mpd daemons on the specified number of nodes by providing a list of node names in .

The mpd daemons are started using the rsh command by default. If the rsh connectivity is not enabled, use the r ssh option to switch over to ssh. Make sure that all nodes in the cluster can connect to each other via the rsh command without a password or, if the r ssh option is used, via the ssh command without a password.

-1 Remove the restriction of starting only one mpd per machine.

I finally solve my problem by flowing command:

mpdboot -n -r ssh -1

I tried the above and received the following.

[admin@localhost ready_GNU]$ mpdboot -n -r ssh -1 -verbose
Traceback (most recent call last):
  File "<stdin>", line 1068, in <module>
  File "<stdin>", line 386, in mpdboot
ValueError: invalid literal for int() with base 10: '-r'

I am using the following version

[admin@localhost ready_GNU]$ mpdboot --version
Intel(R) MPI Library for Linux, 64-bit applications, Version 4.1  Build 20120831
Copyright (C) 2003-2012 Intel Corporation.  All rights reserved.

Any help appreciated.

imagem de James Tullos (Intel)

Hi Maurice,

You should not need to use mpdboot anymore.  Try just using mpirun, this will use the Hydra process manager, which does not need a daemon running ahead of time.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Faça login para deixar um comentário.