IFORT/OPENMPI EXECUTION PROBLEM

IFORT/OPENMPI EXECUTION PROBLEM

We have an evaluation ifort (version 11.0) installation with which I am attempting to compile and run an MPI application; I am attempting to run on a single, two processor Intel Xeon machine running mandriva Linux release 2008.1 OS on which we have installed OpenMPI (openmpi-1.2.8-1mdv2008.1). The MPI application in question has been run successfully on this machine after being compiled with ifc, gfortran and lf95. After compiling my application with ifort, error messages generated by the MPI error handler are emitted when I attempt to run the application:

(bash) niwot.pts/13% mpirun -np 3 wrapper_ifort_omp_g.ex wrapper: start
wrapper: start
wrapper: start
[niwot.cr.usgs.gov:21906] *** An error occurred in MPI_Comm_set_errhandler
[niwot.cr.usgs.gov:21906] *** on communicator MPI_COMM_WORLD
[niwot.cr.usgs.gov:21906] *** MPI_ERR_ARG: invalid argument of some other kind
[niwot.cr.usgs.gov:21906] *** MPI_ERRORS_ARE_FATAL (goodbye)
[niwot.cr.usgs.gov:21908] *** An error occurred in MPI_Comm_set_errhandler
[niwot.cr.usgs.gov:21908] *** on communicator MPI_COMM_WORLD
[niwot.cr.usgs.gov:21908] *** MPI_ERR_ARG: invalid argument of some other kind
[niwot.cr.usgs.gov:21908] *** MPI_ERRORS_ARE_FATAL (goodbye)
[niwot.cr.usgs.gov:21907] *** An error occurred in MPI_Comm_set_errhandler
[niwot.cr.usgs.gov:21907] *** on communicator MPI_COMM_WORLD
[niwot.cr.usgs.gov:21907] *** MPI_ERR_ARG: invalid argument of some other kind
[niwot.cr.usgs.gov:21907] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 2 with PID 21908 on node niwot.cr.usgs.gov exited on signal 41 (Real-time signal 7).

These messages are not part of the MPI application. When I execute with the TotalView debugger as follows, then I get the impression that the job isn't attaching properly to the processes:

(bash) niwot.pts/13% mpirun -tv -np 3 wrapper_ifort_omp_g.ex Copyright 2007-2009 by TotalView Technologies, LLC. ALL RIGHTS RESERVED.
Copyright 1999-2007 by Etnus, LLC.
Copyright 1999 by Etnus, Inc.
Copyright 1996-1998 by Dolphin Interconnect Solutions, Inc.
Copyright 1989-1996 by BBN Inc.
TotalView Technologies ReplayEngine
Copyright 2009 TotalView Technologies
ReplayEngine uses the UndoDB Reverse Execution Engine
Copyright 2005-2009 Undo Limited
Reading symbols for process 1, executing "mpirun"
Library /usr/bin/orterun, with 2 asects, was linked at 0x08048000, and initially loaded at 0x10000000
.
.
.
wrapper: start
wrapper: start
wrapper: start
Can't attach to group member - perhaps because the executable was not found: process not found
Couldn't attach to process 22141 in cluster 0, node 1 -- skipping it
Couldn't attach to process 22142 in cluster 0, node 1 -- skipping it
Couldn't attach to process 22143 in cluster 0, node 1 -- skipping it
[niwot.cr.usgs.gov:22131] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)

These messages are emitted after attempting to attach to all processes in TotalView and then starting the execution.

Is there something else or in addition that I should be doing here?

-- Rich Naff

6 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Hi Rich,

Thanks for getting in touch with us. Are you successfully able to run OpenMPI with the Intel Fortran Compiler for a simpler application? Something like an MPI Hello World program, just to make sure that your installation of OpenMPI and the Intel Fortran Compiler are okay.

Additionally, have you compiled OpenMPI itself using the Intel Compilers? More information on how that's done is available here.

If you believe this might be an issue with the Intel Fortran Compilers, you can submit a request to the development team via the Intel Premier Support site.

Thanks and regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Quoting - Gergana Slavova (Intel)

Hi Rich,

Thanks for getting in touch with us. Are you successfully able to run OpenMPI with the Intel Fortran Compiler for a simpler application? Something like an MPI Hello World program, just to make sure that your installation of OpenMPI and the Intel Fortran Compiler are okay.

Additionally, have you compiled OpenMPI itself using the Intel Compilers? More information on how that's done is available here.

If you believe this might be an issue with the Intel Fortran Compilers, you can submit a request to the development team via the Intel Premier Support site.

Thanks and regards,
~Gergana

Gergana: My Hello program appears to be functioning:

(bash) niwot.pts/13% mpirun -np 3 hello.ex
Hello, world! I am 1 of 3
Hello, world! I am 2 of 3
Hello, world! I am 0 of 3

We have -not- compiled OpenMPI itself using the Intel Compilers; our version of OpenMPI is that which comes with the Mandriva release. I can request that our system administrator do so if you believe this to be the problem; please advise me acordingly.

--Rich

If you're reluctant to proceed without a second opinion, the OpenMPI team has a good FAQ and support mailing list.

Quoting - tim18
If you're reluctant to proceed without a second opinion, the OpenMPI team has a good FAQ and support mailing list.

Gergana: Okay we did follow your suggestion and rebuilt the OpenMPI installation using the Intel compilers; the MPI application now works. Now I can do the TotalView testing.

Thanks, Rich

Quoting - emrys56
Gergana: Okay we did follow your suggestion and rebuilt the OpenMPI installation using the Intel compilers; the MPI application now works. Now I can do the TotalView testing.

Hi Rich,

I'm glad to hear everything worked out. If you do have further problems, in addition to the OpenMPI resources Tim mentioned, I can also suggest visiting the Intel Fortran Compiler forums.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com

Accedere per lasciare un commento.