coarray giving mpd error

coarray giving mpd error

Hello,

I am just trying to learn coarray. The code, from Steve Blair-Chappell's is:

$ cat hello.f90 

!This is file : hello

Program  hello

Implicit None

write(*,*)"Hello", this_image()

End Program  hello

and compiled it using

 ifort -coarray hello.f90

But I am getting error:

$ ./a.out 

mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_rudra_ICAF_2735); possible causes:

1. no mpd is running on this host

2. an mpd is running but was started without a "console" (-n option)

The error is correct, as there is no mpd running.
But do I really need mpi to run coarray?
my fortran is :

$ ifort -vifort version 13.1.3

 

28 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

possibly this is the cause:$ mpirun 

/opt/intel/composer_xe_2013.5.192/mpirt/bin/intel64/mpirun: line 96: /opt/intel/composer_xe_2013.5.192/mpirt/bin/intel64/mpivars.sh: No such file or directory

In my bashrc, I have 

source /opt/intel/bin/compilervars.sh intel64

but I don't have mpivars.sh.Kindly help.

If you have only the co-array runtime (not a full MPI) on PATH and LD_LIBRARY_PATH, there is no mpivars script; I see the same.  However:

./a.out

 Hello           1
 Hello           2
 Hello           4
 Hello           3
 Hello           6
 Hello          13
 Hello           7
 Hello          15
 Hello          10
 Hello           5
 Hello           8
 Hello           9
 Hello          19
 Hello          16
 Hello          17
 Hello          18
 Hello          20
 Hello          21
 Hello          12
 Hello          24
 Hello          14
 Hello          11
 Hello          22
 Hello          23

So I suspect something about your installation; perhaps you didn't install with root permissions, or don't have permission on /tmp.  I suppose you should be able to change the assignment to /tmp to your local /tmp where you have permission, by environment variable.

I expect the mpd would run only for the duration of your execution.

Does ldd show all the .so references?

ldd a.out
    linux-vdso.so.1 =>  (0x00007fff6c9ff000)
    libicaf.so => /home/opt/intel/composer_xe_2013.5.192/compiler/lib/intel64/libicaf.so (0x00007ff48b60f000)
    libm.so.6 => /lib64/libm.so.6 (0x000000334c200000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x000000334ca00000)
    libc.so.6 => /lib64/libc.so.6 (0x000000334be00000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003356e00000)
    libdl.so.2 => /lib64/libdl.so.2 (0x000000334c600000)
    libmpi_mt.so.4 => /home/opt/intel/composer_xe_2013.5.192/mpirt/lib/intel64/libmpi_mt.so.4 (0x00007ff48afbe000)
    libintlc.so.5 => /home/opt/intel/composer_xe_2013.5.192/compiler/lib/intel64/libintlc.so.5 (0x00007ff48ad6f000)
    /lib64/ld-linux-x86-64.so.2 (0x000000334b600000)
    librt.so.1 => /lib64/librt.so.1 (0x000000334d200000)

Citation :

TimP (Intel) a écrit :

Does ldd show all the .so references?

yes.

$ ldd a.out

linux-vdso.so.1 =>  (0x00007fffcb9fe000)

libicaf.so => /opt/intel/composer_xe_2013.5.192/compiler/lib/intel64/libicaf.so (0x00007f76bea69000)

libm.so.6 => /lib64/libm.so.6 (0x0000003845a00000)

libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003845200000)

libc.so.6 => /lib64/libc.so.6 (0x0000003844a00000)

libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003847200000)

libdl.so.2 => /lib64/libdl.so.2 (0x0000003844e00000)

libmpi_mt.so.4 => /opt/intel/composer_xe_2013.5.192/mpirt/lib/intel64/libmpi_mt.so.4 (0x00007f76be40f000)

libintlc.so.5 => /opt/intel/composer_xe_2013.5.192/compiler/lib/intel64/libintlc.so.5 (0x00007f76be1c0000)/lib64/ld-linux-x86-64.so.2 (0x0000003844200000)

librt.so.1 => /lib64/librt.so.1 (0x0000003846600000)

Do I need to do anything to put coarray in PATH or LD_LIBRARY_PATH?

during my installation i do get an error that libc_osi is missing.  I dont know if that is the reason.

If the .so references resolve OK, your LD_LIBRARY_PATH must be OK.

TimP,

did not got you

Issue the command:
   which mpiexec

Coarray support uses MPI underneath the covers, and built-into the a.out executable is an invocation of mpiexec, and we provide the Intel MPI mpiexec on the Fortran kit.

We've seen cases where an existing installation of a different MPI causes a little confusion.    You may need to modify your "path" variable to put our mpiexec before the one installed on your system.

           --Lorri

$ which mpiexec

/opt/intel/composer_xe_2013.5.192/mpirt/bin/intel64/mpiexec

and I have no other mpi installed in my machine.

Any help please?

I would try uninstalling and reinstalling Fortran. Which distribution and version of Linux are you using?

Steve - Intel Developer Support

Citation :

Steve Lionel (Intel) a écrit :
Which distribution and version of Linux are you using?

Fedora 19, 64bit

Hmm - Fedora 19 is newer than what we have tested. It would not astonish me if it broke something - new versions of Linux distros often do. But Lorri is the real expert in this area and I'll ask her to try to help you further.

Steve - Intel Developer Support

The command that is built-into your a.out looks something like this:
'mpd --daemon >/dev/null && mpiexec -genv I_MPI_DEVICE shm -genv I_MPI_FALLBACK disable -n 8 a.out ; mpdallexit'.

So, let's try a couple more things. 
First,
      which mpd
Is it in the same directory where you found your mpiexec?

Ultimately,  "mpd" is a python script, so can you please issue the following:
          python -V

Please note that is an uppercase V ...if you use lowercase v it's noisy and unhelpful for this situation.

Let's try the "mpd" command (without sending errors to the null-device) and see if that gives a clue:
    mpd --daemon

We'll figure out where to go after we see any errors, OK?

                 --Lorri

Hi Lorri,

Thanks for your reply.  mpd seems to give some error I guess.

$ which mpd

/opt/intel/composer_xe_2013.5.192/mpirt/bin/intel64/mpd

$ python -V

Python 2.7.5

$ mpd --daemon

roddur_56505: mpd_uncaught_except_tb handling: 

<type 'exceptions.AttributeError'>: 'MPD' object has no attribute 'myIP'    /opt/intel/composer_xe_2013.5.192/mpirt/bin/intel64/mpd  1677  run       

myIP=self.myIP,   

/opt/intel/composer_xe_2013.5.192/mpirt/bin/intel64/mpd  3676  <module>       

mpd.run()

Hi Lorri,

Anything else I should check?

Hi roddur -

   Sorry for the delay; I was off for a couple of days.
   Can you get your hands on an older version of python?  We have 2.6.6 here and coarrays work.
   If you *can* get a copy, if it's possible to install it / insert it into the Composer XE directories instead of your system directories, that would be less disruptive to your system in general.

   If you cannot get an older Python installed on your system, we'll have to try a handful of undocumented/unsupported tricks to get you running so you can continue your experimentation. 

                 --Lorri

Hi Lorri,

Thanks for your reply. For Fedora system, 2.6.6 is bit old. I have downloaded the python 2.6.6. So, Can I directly insert it in some place under ifor installation? I think I can have two different version of python in a linux, so that may not be a problem. But please let me know where and how to place it.

Citation :

Lorri Menard (Intel) a écrit :
Can you get your hands on an older version of python?  We have 2.6.6 here and coarrays work.

Just FYI, I can run the example code with no problems and am using Python 2.7.5

Casey, that's interesting news ... what Linux-distro are you using?  And, what version of Intel Fortran do you have installed?   Maybe my concern about the python-incompatibility is ill-founded.  I had based my concern on the errors that roddur had listed ...

roddur, maybe you shouldn't install python just yet; I don't want to mess up your system, just in case.
Instead, can we try something else? 

I'd mentioned before that we build-in a command that invokes "mpd" then "mpiexec" then "mpdallexit".   I'm going to tell you how to shut off that built-in command, and then we can experiment with other commands.

    ifort -coarray -switch no_launch hello.f90   

this will create "a.out", and we can try some commands.   FIrst, let's try

              mpiexec.hydra a.out

Did that succeed?

Lorri,

I am using Gentoo Linux.

$ uname -a
Linux convect 3.8.13 #1 SMP Thu Jul 18 23:44:47 GMT 2013 x86_64 Intel(R) Core(TM) i7 CPU X 990 @ 3.47GHz GenuineIntel GNU/Linux

$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.3.192 Build 20130607
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.

$ mpd --version
Intel(R) MPI Library for Linux, 64-bit applications, Version 4.1 Build 20120831
Copyright (C) 2003-2012 Intel Corporation. All rights reserved.

$ python --version
Python 2.7.5

If you would like any other information regarding my distro or environment, let me know.

Hi Lorri,
Depending on Casey's first post, I have reinstalled my F19, and whoa, its working:

 mpd --version
Intel(R) MPI Library for Linux, 64-bit applications, Version 4.1  Build 20120831
Copyright (C) 2003-2012 Intel Corporation.  All rights reserved.

 ifort -v
ifort version 13.1.3

 python -V
Python 2.7.5

and, last but not the list,

$ ./a.out
 Hello           1
 Hello           2
 Hello           3
 Hello           5
 Hello           4
 Hello           8
 Hello           6
 Hello           7

Sorry for wasting your time.

roddur - That is excellent news!  And no, never a waste of time.  Happy CAF'ing :-)

Casey, thank you for your help.

                  --Lorri

Hi Lorri and others,

I have found the (apperant) reason of the problem I have stated.

if I change my machine-name (using hostnamectl) to something other then localhost.localdomain, then the caf programs failed:

$ sudo hostnamectl set-hostname roddur

[rudra@roddur ~]$ ./a.out 

mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_rudra_ICAF_19938); possible causes:

  1. no mpd is running on this host

  2. an mpd is running but was started without a "console" (-n option)

$ sudo hostnamectl set-hostname localhost.localdomain

[rudra@roddur ~]$ ./a.out 

Enter your name: Ru

Hello Ru from image 1Hello Ru from image 3Hello Ru from image 7Hello Ru from image 5Hello Ru from image 2Hello Ru from image 6Hello Ru from image 4Hello Ru from image 8

Roddur,

   This suggests that your networking is not setup correctly.  I dont use a distro that uses hostnamectl but reading its man page it appears all it does is set your machine name but is unrelated to name resolution.  When you set your hostname to roddur, what does the command "ping roddur" do?  If it does not work, then that is your issue.  A quick fix is to open your /etc/hosts file and add "roddur" to the line that starts with "127.0.0.1", which will cause roddur  to be mapped to localhost.  If you have another ip you want that name to resolve to, you can just add a new line for it.  When you get "ping roddur" to work, then your coarray code should also work.

Hi, I'm having the same error with the following cod:

Program hello
  implicit none
  write(*,*) "Hello from image ", this_image()
End Program

which leads to

$  ifort -coarray hello.f90; ./a.out
mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_Lopez_ICAF_2879); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)

Here is some information:

$  uname -a
Linux dell17 3.8.8-202.fc18.x86_64 #1 SMP Wed Apr 17 23:25:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.3.192 Build 20130607
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.
FOR NON-COMMERCIAL USE ONLY
$ mpd --version
Intel(R) MPI Library for Linux, 64-bit applications, Version 4.1  Build 20120831
Copyright (C) 2003-2012 Intel Corporation.  All rights reserved.
$ python --version
Python 2.7.3
$ which mpd
/opt/intel/composer_xe_2013.5.192/mpirt/bin/intel64/mpd
$  which mpiexec
/opt/intel/composer_xe_2013.5.192/mpirt/bin/intel64/mpiexec
$  mpd --daemon
dell17_42934: mpd_uncaught_except_tb handling:
  <type 'exceptions.AttributeError'>: 'MPD' object has no attribute 'myIP'
    /opt/intel/composer_xe_2013.5.192/mpirt/bin/intel64/mpd  1677  run
        myIP=self.myIP,
    /opt/intel/composer_xe_2013.5.192/mpirt/bin/intel64/mpd  3676  <module>
        mpd.run()

This last command return an error, as for Roddur.

I've also tried

  ifort -coarray -switch no_launch hello.f90; mpiexec.hydra a.out

which does not generate any error nor output.

Any other clues to how could I fix this ?

Thanks

Citation :

flying_hermes a écrit :

Hi, I'm having the same error with the following cod:

Hi,

Possibly I know the reason. You (probably) don't have dell17 in your /etc/hosts.

just append the line starts with 127.0.0.1 with the name dell17 (as this is the name of your machine,) and that will work.

It works !!!

Thanks a lot roddur.

Any idea why the trick is required?

Lorri, 

I have done some testing after the last post and confimed this exception is thrown when hostname resolution fails.  

I'm attaching a simple patch to the mpd python script that will detect this condition and then print an error message and raise a generic exception to halt mpd.  How you choose to handle a hostname failing to resolve (defaulting to localhost or aborting) is up to you, but I would reccomend at least applying this patch (or something based on it) to better catch and inform the user of the problem with host networking.

*** mpd Tue Aug 20 16:56:31 2013
--- mpd-new Tue Aug 20 16:56:21 2013
***************
*** 1647,1652 ****
--- 1647,1656 ----
 if self.parmdb['MPD_MY_IP'] :
 self.myIP = self.parmdb['MPD_MY_IP'] # variable for convenience
+ if not hasattr(self,'myIP'):
+ print "Error: Your hostname '", self.myIfhn, "' does not resolve to an IP address."
+ raise Exception("Hostname resoultion failure")
+ 
 self.ncpusTable = {}
 self.ncpusEquality = 0

Note: mpd.diff is renamed to mpd.diff.txt so the uploader will accept it.  

Fichiers joints: 

Fichier attachéTaille
Télécharger mpd.diff.txt492 octets

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui