Compile .C and .f90 with MPI

Compile .C and .f90 with MPI

Hi, I'm working with a .f90 program (compiled with ifort). I modified this program. I included a C program to this .f90 program (prog.f90 calls to prog.C). In a subroutine the .f90 program calls to system command to creat a directory. The program runs in a good way but presents a detail: The modified program calls to system command but the command is not ejecuted, this is very strange. The grogram run but does't ejecute the comand. The unmodified program (without .C) ejecutes correctly the system command.
The .f90 program can run with an MPI architecture. When I activate the MPI architecture (with mpif90) the program compile but when I run the program it tells me:

forrtl: error (72): floating overflow
Image PC Routine Line Source
ramses3d 00000000005F3C48 Unknown Unknown Unknown
ramses3d 000000000040634F Unknown Unknown Unknown
ramses3d 0000000000410DC2 Unknown Unknown Unknown
ramses3d 0000000000423273 Unknown Unknown Unknown
ramses3d 0000000000426D48 Unknown Unknown Unknown
ramses3d 000000000043B6C1 cooling_module_mp 323 cooling_module.f90
ramses3d 0000000000450D04 init_time_ 60 init_time.f90
ramses3d 0000000000453801 adaptive_loop_ 21 adaptive_loop.f90
ramses3d 000000000052E1E2 MAIN__ 8 ramses.f90
ramses3d 0000000000404482 Unknown Unknown Unknown
libc.so.6 0000003F8381D974 Unknown Unknown Unknown
ramses3d 00000000004043A9 Unknown Unknown Unknown

The program without the modification (without the .C program) compiles and runs in a good way (in serial and parallel mode), but when I include the .C program the program doesn't ejecute the system command with serial architecture and show me the error (72) with parallel architecture.
The problem is in my .C program, but what is the problem?...
Someone can give a clue?
Thank you.

18 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - jpprieto
Hi, I'm working with a .f90 program (compiled with ifort). I modified this program. I included a C program to this .f90 program (prog.f90 calls to prog.C). In a subroutine the .f90 program calls to system command to creat a directory. The program runs in a good way but presents a detail: The modified program calls to system command but the command is not ejecuted, this is very strange. The grogram run but does't ejecute the comand. The unmodified program (without .C) ejecutes correctly the system command.
The .f90 program can run with an MPI architecture. When I activate the MPI architecture (with mpif90) the program compile but when I run the program it tells me:

forrtl: error (72): floating overflow
Image PC Routine Line Source
ramses3d 00000000005F3C48 Unknown Unknown Unknown
ramses3d 000000000040634F Unknown Unknown Unknown
ramses3d 0000000000410DC2 Unknown Unknown Unknown
ramses3d 0000000000423273 Unknown Unknown Unknown
ramses3d 0000000000426D48 Unknown Unknown Unknown
ramses3d 000000000043B6C1 cooling_module_mp 323 cooling_module.f90
ramses3d 0000000000450D04 init_time_ 60 init_time.f90
ramses3d 0000000000453801 adaptive_loop_ 21 adaptive_loop.f90
ramses3d 000000000052E1E2 MAIN__ 8 ramses.f90
ramses3d 0000000000404482 Unknown Unknown Unknown
libc.so.6 0000003F8381D974 Unknown Unknown Unknown
ramses3d 00000000004043A9 Unknown Unknown Unknown

The program without the modification (without the .C program) compiles and runs in a good way (in serial and parallel mode), but when I include the .C program the program doesn't ejecute the system command with serial architecture and show me the error (72) with parallel architecture.
The problem is in my .C program, but what is the problem?...
Someone can give a clue?
Thank you.

It's unclear from what you have sent. What is the statement at line 323 in cooling_module.f90?

I'd also try compiler options -g -traceback -fp-stack-check -check all -warn all

ron

Quoting - Ronald W. Green (Intel)

It's unclear from what you have sent. What is the statement at line 323 in cooling_module.f90?

I'd also try compiler options -g -traceback -fp-stack-check -check all -warn all

ron

Hi, Ronald.
The line 323 is:
call evol_single_cell(astart,aend,dasura,h,omegab,omega0,omegaL,-1.0d0,T2end,mu,ne,.false.)
the type of the input variables is
real(kind=8) :: astart,aend,dasura,T2end,mu,ne
real(kind=8) :: h,omegab,omega0,omegaL

About the compiler option -g -traceback -fp-stack-check -check all -warm all.
I'm compiling with
F90 = mpif90
FFLAGS = -O3 -g -traceback -fpe0 -ftrapuv -cpp -DNDIM=$(NDIM) -DNPRE=$(NPRE) -DSOLVER$(SOLVER) -DNOSYSTEM
I'll compile it with the options given by you.

Quoting - jpprieto

Quoting - Ronald W. Green (Intel)

It's unclear from what you have sent. What is the statement at line 323 in cooling_module.f90?

I'd also try compiler options -g -traceback -fp-stack-check -check all -warn all

ron

Hi, Ronald.
The line 323 is:
call evol_single_cell(astart,aend,dasura,h,omegab,omega0,omegaL,-1.0d0,T2end,mu,ne,.false.)
the type of the input variables is
real(kind=8) :: astart,aend,dasura,T2end,mu,ne
real(kind=8) :: h,omegab,omega0,omegaL

About the compiler option -g -traceback -fp-stack-check -check all -warm all.
I'm compiling with
F90 = mpif90
FFLAGS = -O3 -g -traceback -fpe0 -ftrapuv -cpp -DNDIM=$(NDIM) -DNPRE=$(NPRE) -DSOLVER$(SOLVER) -DNOSYSTEM
I'll compile it with the options given by you.

The stacktrace into evol_single_cell() and all calls after that have no symbolic information. Is that code in C or in Fortran? Something down in evol_single_cell or something it calls is blowing up.

If evol_single_cell is Fortran, compile everything with:

-gen-interfaces -warn interfaces

to make sure that the calling sequence is correct.

ron

Quoting - Ronald W. Green (Intel)

Quoting - jpprieto

Quoting - Ronald W. Green (Intel)

It's unclear from what you have sent. What is the statement at line 323 in cooling_module.f90?

I'd also try compiler options -g -traceback -fp-stack-check -check all -warn all

ron

Hi, Ronald.
The line 323 is:
call evol_single_cell(astart,aend,dasura,h,omegab,omega0,omegaL,-1.0d0,T2end,mu,ne,.false.)
the type of the input variables is
real(kind=8) :: astart,aend,dasura,T2end,mu,ne
real(kind=8) :: h,omegab,omega0,omegaL

About the compiler option -g -traceback -fp-stack-check -check all -warm all.
I'm compiling with
F90 = mpif90
FFLAGS = -O3 -g -traceback -fpe0 -ftrapuv -cpp -DNDIM=$(NDIM) -DNPRE=$(NPRE) -DSOLVER$(SOLVER) -DNOSYSTEM
I'll compile it with the options given by you.

The stacktrace into evol_single_cell() and all calls after that have no symbolic information. Is that code in C or in Fortran? Something down in evol_single_cell or something it calls is blowing up.

If evol_single_cell is Fortran, compile everything with:

-gen-interfaces -warn interfaces

to make sure that the calling sequence is correct.

ron

evol_single_cell is a fortran program. evol_single_cell calls to the C program.

Quoting - jpprieto

Quoting - Ronald W. Green (Intel)

Quoting - jpprieto

Quoting - Ronald W. Green (Intel)

It's unclear from what you have sent. What is the statement at line 323 in cooling_module.f90?

I'd also try compiler options -g -traceback -fp-stack-check -check all -warn all

ron

Hi, Ronald.
The line 323 is:
call evol_single_cell(astart,aend,dasura,h,omegab,omega0,omegaL,-1.0d0,T2end,mu,ne,.false.)
the type of the input variables is
real(kind=8) :: astart,aend,dasura,T2end,mu,ne
real(kind=8) :: h,omegab,omega0,omegaL

About the compiler option -g -traceback -fp-stack-check -check all -warm all.
I'm compiling with
F90 = mpif90
FFLAGS = -O3 -g -traceback -fpe0 -ftrapuv -cpp -DNDIM=$(NDIM) -DNPRE=$(NPRE) -DSOLVER$(SOLVER) -DNOSYSTEM
I'll compile it with the options given by you.

The stacktrace into evol_single_cell() and all calls after that have no symbolic information. Is that code in C or in Fortran? Something down in evol_single_cell or something it calls is blowing up.

If evol_single_cell is Fortran, compile everything with:

-gen-interfaces -warn interfaces

to make sure that the calling sequence is correct.

ron

evol_single_cell is a fortran program. evol_single_cell calls to the C program.

Hi, Ron. Thank you for your response.
I compiled the code with your options
F90 = mpif90
FFLAGS = -O3 -g -traceback -fp-stack-check -ftrapuv -warn all -check all -cpp -DNDIM=$(NDIM) -DNPRE=$(NPRE) -DSOLVER$(SOLVER) -DNOSYSTEM
Now, the error is:

forrtl: warning (402): fort: (1): In call to CMP_CHEM_NONEQ, an array temporary was created for argument #7
forrtl: error (65): floating invalid
Image PC Routine Line Source
ramses3d 0000000000410D55 Unknown Unknown Unknown
ramses3d 0000000000423EA4 Unknown Unknown Unknown
ramses3d 0000000000427B04 Unknown Unknown Unknown
ramses3d 0000000000439F7C cooling_module_mp 487 cooling_module.f90
ramses3d 0000000000463849 Unknown Unknown Unknown
ramses3d 00000000004A4C31 init_time_ 60 init_time.f90
ramses3d 00000000004AB0AD adaptive_loop_ 21 adaptive_loop.f90
ramses3d 0000000000AC83F6 MAIN__ 8 ramses.f90
ramses3d 0000000000404442 Unknown Unknown Unknown
libc.so.6 0000003BCA61D974 Unknown Unknown Unknown
ramses3d 0000000000404369 Unknown Unknown Unknown

The line 487 of cooling_module.f90 has the called to the C program:
call cmp_chem_noneq(nH,T2,dt_cool,DT2,mu,aexp,uini(1,1:nvar-ndim-3),ini)
The argument number 7 has som physical properties of gas element. There is something wrong in the way in which I send uini to the C program?
uini is defined as
real(dp),allocatable,dimension(:,:)::uini
and in the C program is received as
a
double uini[]
variable.
I need compile this program because I want to perform large hydrodynamical simulations in a cosmological context.
Thank you.

It's hard to say if uini is the cause. From the last trace we see that on the Fortran side an array temporary is created since you are passing a row vector that is discontiguous in memory. This is the correct thing to do, since the C is expecting a vector that is contiguous in memory.

Without a trace on the C code we don't have enough to go on. Are you compiling the C code with -g?

Do you have a debugger like TotalView you can use to debug MPI? If not, add some code to the C to check the arguments coming in. How have you declared cmp_chem_noneq within cooling_module.f90, and please don't say you just declare it EXTERNAL.

Someone needs to dig deep into this code, the error is not obvious from the little information I have.

ron

Quoting - Ronald W. Green (Intel)
It's hard to say if uini is the cause. From the last trace we see that on the Fortran side an array temporary is created since you are passing a row vector that is discontiguous in memory. This is the correct thing to do, since the C is expecting a vector that is contiguous in memory.

Without a trace on the C code we don't have enough to go on. Are you compiling the C code with -g?

Do you have a debugger like TotalView you can use to debug MPI? If not, add some code to the C to check the arguments coming in. How have you declared cmp_chem_noneq within cooling_module.f90, and please don't say you just declare it EXTERNAL.

Someone needs to dig deep into this code, the error is not obvious from the little information I have.

ron

Hi.
No.I'm compiling the code without -g. I'm compiling with
gcc -c coolinghd.c
to create the .o file and to do the link with the others .o files from the fortran code.
I don't have a debugger. But I have checked the arguments comming in the C code and there isn't problem. All arguments have the correct value.
About cmp_chem_noneq: The coolinghd.c code start with (to the end)
int cmp_chem_noneq_(double *rhob,double *T2,double *dt,double *DT2,double *mu,double *a,double uini[],int *ii)
{
...
And inside the cooling_module.f90 I call to this program as
...
call cmp_chem_noneq(nH,T2,dt_cool,DT2,mu,aexp,uini,ini)
...
I make the link between the codes with the .o files.
Thank you very much.
I'm wayting for your reply.

Quoting - jpprieto

Hi.
No.I'm compiling the code without -g. I'm compiling with
gcc -c coolinghd.c
to create the .o file and to do the link with the others .o files from the fortran code.
I don't have a debugger. But I have checked the arguments comming in the C code and there isn't problem. All arguments have the correct value.
About cmp_chem_noneq: The coolinghd.c code start with (to the end)
int cmp_chem_noneq_(double *rhob,double *T2,double *dt,double *DT2,double *mu,double *a,double uini[],int *ii)
{
...
And inside the cooling_module.f90 I call to this program as
...
call cmp_chem_noneq(nH,T2,dt_cool,DT2,mu,aexp,uini,ini)
...
I make the link between the codes with the .o files.
Thank you very much.
I'm wayting for your reply.

How is cmp_chem_noneq defined in the Fortran program? Do you have any sort of interface declaration for it, or are you simply calling it as shown?

thanks --

- Lorri

Quoting - jpprieto

Quoting - Ronald W. Green (Intel)
It's hard to say if uini is the cause. From the last trace we see that on the Fortran side an array temporary is created since you are passing a row vector that is discontiguous in memory. This is the correct thing to do, since the C is expecting a vector that is contiguous in memory.

Without a trace on the C code we don't have enough to go on. Are you compiling the C code with -g?

Do you have a debugger like TotalView you can use to debug MPI? If not, add some code to the C to check the arguments coming in. How have you declared cmp_chem_noneq within cooling_module.f90, and please don't say you just declare it EXTERNAL.

Someone needs to dig deep into this code, the error is not obvious from the little information I have.

ron

Hi.
No.I'm compiling the code without -g. I'm compiling with
gcc -c coolinghd.c
to create the .o file and to do the link with the others .o files from the fortran code.
I don't have a debugger. But I have checked the arguments comming in the C code and there isn't problem. All arguments have the correct value.
About cmp_chem_noneq: The coolinghd.c code start with (to the end)
int cmp_chem_noneq_(double *rhob,double *T2,double *dt,double *DT2,double *mu,double *a,double uini[],int *ii)
{
...
And inside the cooling_module.f90 I call to this program as
...
call cmp_chem_noneq(nH,T2,dt_cool,DT2,mu,aexp,uini,ini)
...
I make the link between the codes with the .o files.
Thank you very much.
I'm wayting for your reply.

To get a traceback including the C code, you should using Intel icc and the traceback flag. Since that is where the problem occurs, getting a traceback of the Fortran code does not help. If you don't have icc, run the program under gdb, and you can probably get a stack trace where the C code fails (GDB command 'bt').

Are you sure that you are not accessing UINI out of bounds? As a quick test, you could print the first and last value of UINI from Fortran before the call, and from C after the call.

Also, how does this floating error in a C call relate to a system command call?

Quoting - krahn@niehs.nih.gov

Quoting - jpprieto

Quoting - Ronald W. Green (Intel)
It's hard to say if uini is the cause. From the last trace we see that on the Fortran side an array temporary is created since you are passing a row vector that is discontiguous in memory. This is the correct thing to do, since the C is expecting a vector that is contiguous in memory.

Without a trace on the C code we don't have enough to go on. Are you compiling the C code with -g?

Do you have a debugger like TotalView you can use to debug MPI? If not, add some code to the C to check the arguments coming in. How have you declared cmp_chem_noneq within cooling_module.f90, and please don't say you just declare it EXTERNAL.

Someone needs to dig deep into this code, the error is not obvious from the little information I have.

ron

Hi.
No.I'm compiling the code without -g. I'm compiling with
gcc -c coolinghd.c
to create the .o file and to do the link with the others .o files from the fortran code.
I don't have a debugger. But I have checked the arguments comming in the C code and there isn't problem. All arguments have the correct value.
About cmp_chem_noneq: The coolinghd.c code start with (to the end)
int cmp_chem_noneq_(double *rhob,double *T2,double *dt,double *DT2,double *mu,double *a,double uini[],int *ii)
{
...
And inside the cooling_module.f90 I call to this program as
...
call cmp_chem_noneq(nH,T2,dt_cool,DT2,mu,aexp,uini,ini)
...
I make the link between the codes with the .o files.
Thank you very much.
I'm wayting for your reply.

To get a traceback including the C code, you should using Intel icc and the traceback flag. Since that is where the problem occurs, getting a traceback of the Fortran code does not help. If you don't have icc, run the program under gdb, and you can probably get a stack trace where the C code fails (GDB command 'bt').

Are you sure that you are not accessing UINI out of bounds? As a quick test, you could print the first and last value of UINI from Fortran before the call, and from C after the call.

Also, how does this floating error in a C call relate to a system command call?

Hi, Ron.
Now I'm compiling with
icc -c -g -traceback -w coolinghd.c
All seems ok. I follow the uini and its values are ok, but the program end with
...
rank 1 in job 1 geryon07_46495 caused collective abort of all ranks
exit status of rank 1: killed by signal 9

About the call system... this was the first problem, beacuse the command doesn't work in the serial mode of the program. I must to create the files manually.
Thank you.

What MPI and version are you using?
And what version of Intel Fortran?

for MPI - what is your configure arguments when you built the MPI package?

ron

Quoting - Ronald W. Green (Intel)
What MPI and version are you using?
And what version of Intel Fortran?

for MPI - what is your configure arguments when you built the MPI package?

ron

I'm using MPICH2 and
mpif90 for 1.0.6 Version 10.0
I don't know how to see the last information you need: MPI-...

Quoting - jpprieto
I'm using MPICH2 and
mpif90 for 1.0.6 Version 10.0
I don't know how to see the last information you need: MPI-...

Important information includes how you set up mpif90 so that it uses ifort and ifort run-time libraries, rather than gnu libraries, or whatever would be in a default version of mpif90. An example of the most basic instructions:
http://www.contrib.andrew.cmu.edu/~milop/www1/mpif90.html
Note the reference to the installation manual for reconfiguring MPICH2.

Quoting - tim18

Important information includes how you set up mpif90 so that it uses ifort and ifort run-time libraries, rather than gnu libraries, or whatever would be in a default version of mpif90. An example of the most basic instructions:
http://www.contrib.andrew.cmu.edu/~milop/www1/mpif90.html
Note the reference to the installation manual for reconfiguring MPICH2.

Now I'm testing with mpi working with 1 processor.
All seems ok but suddenly the program stop.
All calculated values are ok but stop in the middle of a function JJ21(nu,aexp). This function is called by IR1Gl(nu,aexp) and this function is called by cpm_chem_noneq(...).
I'm compilig the c program with mpicc -g -w -c.
The output of ramses.pe is
rm: cannot remove `/tmp/70484.1.bigmem.q/rsh': No such file or directory
and the .e file contains
segmentation fault without information about routines.

Quoting - jpprieto

Now I'm testing with mpi working with 1 processor.
All seems ok but suddenly the program stop.
All calculated values are ok but stop in the middle of a function JJ21(nu,aexp). This function is called by IR1Gl(nu,aexp) and this function is called by cpm_chem_noneq(...).
I'm compilig the c program with mpicc -g -w -c.
The output of ramses.pe is
rm: cannot remove `/tmp/70484.1.bigmem.q/rsh': No such file or directory
and the .e file contains
segmentation fault without information about routines.

Here is the Makefile:

# Compilation time parameters
NDIM = 3
NPRE = 8
SOLVER = hydro
PATCH =
EXEC = ramses
# --- MPI, ifort syntax, additional checks -----------
F90 = mpif90
FFLAGS = -O3 -g -traceback -fpe0 -ftrapuv -cpp -DNDIM=$(NDIM) -DNPRE=$(NPRE) -DSOLVER$(SOLVER) -DNOSYSTEM
#############################################################################
MOD = mod
#############################################################################
# MPI librairies
LIBMPI =
#LIBMPI = -lfmpi -lmpi -lelan
LIBS = $(LIBMPI)
#############################################################################
# Sources directories are searched in this exact order
VPATH = $(PATCH):../$(SOLVER):../hydro:../pm:../poisson:../amr
#############################################################################
# All objects
MODOBJ = amr_parameters.o amr_commons.o random.o pm_parameters.o pm_commons.o poisson_parameters.o poisson_commons.o hydr
o_parameters.o hydro_commons.o coolinghd.o cooling_module.o bisection.o
AMROBJ = read_params.o init_amr.o init_time.o init_refine.o adaptive_loop.o amr_step.o update_time.o output_amr.o flag_ut
ils.o physical_boundaries.o virtual_boundaries.o refine_utils.o nbors_utils.o hilbert.o load_balance.o title.o sort.o coo
ling_fine.o units.o
# Particle-Mesh objects
PMOBJ = init_part.o output_part.o rho_fine.o synchro_fine.o move_fine.o newdt_fine.o particle_tree.o add_list.o remove_li
st.o star_formation.o sink_particle.o feedback.o
# Poisson solver objects
POISSONOBJ = init_poisson.o phi_fine_cg.o interpol_phi.o force_fine.o multigrid_coarse.o multigrid_fine_commons.o multigr
id_fine_fine.o multigrid_fine_coarse.o gravana.o boundary_potential.o rho_ana.o output_poisson.o
# Hydro objects
HYDROOBJ = init_hydro.o init_flow_fine.o write_screen.o output_hydro.o courant_fine.o godunov_fine.o uplmde.o umuscl.o in
terpol_hydro.o godunov_utils.o condinit.o hydro_flag.o hydro_boundary.o boundana.o read_hydro_params.o synchro_hydro_fine
.o
# All objects
AMRLIB = $(MODOBJ) $(AMROBJ) $(HYDROOBJ) $(PMOBJ) $(POISSONOBJ)
#############################################################################
ramses: $(AMRLIB) ramses.o
$(F90) $(FFLAGS) $(AMRLIB) ramses.o -o $(EXEC)$(NDIM)d $(LIBS)
#############################################################################
coolinghd.o:coolinghd.c
mpicc -c -g -w coolinghd.c
#############################################################################
%.o:%.f90
$(F90) $(FFLAGS) -c $^ -o $@
#############################################################################
clean :
rm *.o *.$(MOD)
#############################################################################

If you something wrong please tell me.
I included the .c part.
Thank you.

Quoting - jpprieto

Now I'm testing with mpi working with 1 processor.
All seems ok but suddenly the program stop.
All calculated values are ok but stop in the middle of a function JJ21(nu,aexp). This function is called by IR1Gl(nu,aexp) and this function is called by cpm_chem_noneq(...).
I'm compilig the c program with mpicc -g -w -c.
The output of ramses.pe is
rm: cannot remove `/tmp/70484.1.bigmem.q/rsh': No such file or directory
and the .e file contains
segmentation fault without information about routines.

My guess is that the 'rm' is a failure in the MPI handler, where 'rsh' is a script to launch one of the processes. Have you sucessfully run another MPI program using your current MPI build? Make sure that some test programs can run. It may all be an MPI configuration problem.

Quoting - krahn@niehs.nih.gov

My guess is that the 'rm' is a failure in the MPI handler, where 'rsh' is a script to launch one of the processes. Have you sucessfully run another MPI program using your current MPI build? Make sure that some test programs can run. It may all be an MPI configuration problem.

Ok.
When I run a test code (ramses unmodified) the code run in a good way for the following flags:

F90 = mpif90 -O3
FFLAGS = -cpp -DNDIM=$(NDIM) -DNPRE=$(NPRE) -DSOLVER$(SOLVER) -DNOSYSTEM

But with the following flags:

F90 = mpif90
FFLAGS = -O3 -g -traceback -fpe0 -ftrapuv -cpp -DNDIM=$(NDIM) -DNPRE=$(NPRE) -DSOLVER$(SOLVER) -DNOSYSTEM

the program doesn't run.
The error output is

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
ramses3d 0000000000456000 getnborgrids_ 545 nbors_utils.f90
Stack trace terminated abnormally.

And the ramses .pe shows
again

rm: cannot remove `/tmp/70490.1.bigmem.q/rsh': No such file or directory

Thank you.

Leave a Comment

Please sign in to add a comment. Not a member? Join today