MPI_ALLGATHER error parallelizing code

MPI_ALLGATHER error parallelizing code

I'm trying to parallelize the following code.

subroutine log_likelihood(y, theta, lli, ll)
 doubleprecision, allocatable, intent(in) :: y(:) 
 doubleprecision, intent(in) :: theta(2)
 doubleprecision, allocatable, intent(out) :: lli(:)
 doubleprecision, intent(out) :: ll
 integer :: i
 ALLOCATE (lli(size(y)))
 lli = 0.0d0
 ll = 0.0d0
 do i = 1, size(y)
 lli(i) = -log(sqrt(theta(2))) - 0.5*log(2.0d0*pi) &
 - (1.0d0/(2.0d0*theta(2)))*((y(i)-theta(1))**2)
 end do
 ll = sum(lli)
 end subroutine log_likelihood

To do this, I'm trying to use MPI_ALLGATHER. This is the code I wrote

subroutine log_likelihood(y, theta, lli, ll)
 doubleprecision, allocatable, intent(in) :: y(:) 
 doubleprecision, intent(in) :: theta(2)
 doubleprecision, allocatable, intent(out) :: lli(:)
 doubleprecision, intent(out) :: ll
 integer :: i, size_y, diff
size_y=size(y)
ALLOCATE (lli(size_y))
!Broadcasting
call MPI_BCAST(theta, 1, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
call MPI_BCAST(y, 1, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
! Determine how many points to handle with each proc
points_per_proc = (size_y + numprocs - 1)/numprocs
! Determine start and end index for this proc's points
istart = proc_num * points_per_proc + 1
iend = min((proc_num + 1)*points_per_proc, size_y)
diff = iend-istart+1
ALLOCATE(proc_contrib(diff))
do i = istart, iend
 proc_contrib(i) = -log(sqrt(theta(2))) - 0.5*log(2.0d0*pi) &
 - (1.0d0/(2.0d0*theta(2)))*((y(i)-theta(1))**2)
end do
call MPI_ALLGATHER(proc_contrib, diff, MPI_DOUBLE_PRECISION, &
 lli, diff, MPI_DOUBLE_PRECISION, &
 MPI_COMM_WORLD, ierr)
ll = sum(lli)
end subroutine log_likelihood

When I try to run my program, I get the following error.

$ mpiexec -n 2 ./mle.X 
Fatal error in PMPI_Allgather: Internal MPI error!, error stack:
PMPI_Allgather(961)......: MPI_Allgather(sbuf=0x7ff2f251b860, scount=1500000, MPI_DOUBLE_PRECISION, rbuf=0x7ff2f2ad5650, rcount=3000000, MPI_DOUBLE_PRECISION, MPI_COMM_WORLD) failed
MPIR_Allgather_impl(807).: 
MPIR_Allgather(766)......: 
MPIR_Allgather_intra(560): 
MPIR_Localcopy(357)......: memcpy arguments alias each other, dst=0x7ff2f2ad5650 src=0x7ff2f251b860 len=12000000
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

Can somebody please explain to me what I'm doing wrong?

Thanks!

13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello Ignacio, 

 I have the following to note:

Array proc_contrib has elements stored in 1 up to diff. However, the do loop moves from istart to iend, which is fine for only one process and for the others is not. I suggest replacing the do loop with:

do i = istart,iend
proc_contrib(i-istart+1)  = .....
end do

Compiling with -check all should catch it

Kostas

Thanks Kostas!

I fixed that but the problem persists :_(

subroutine log_likelihood(y, theta, lli, ll)
 doubleprecision, allocatable, intent(in) :: y(:) 
 doubleprecision, intent(in) :: theta(2)
 doubleprecision, allocatable, intent(out) :: lli(:)
 doubleprecision, intent(out) :: ll
 integer :: i, size_y, diff
size_y=size(y)
ALLOCATE (lli(size_y))
!Broadcasting
call MPI_BCAST(theta, 1, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
call MPI_BCAST(y, 1, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
! Determine how many points to handle with each proc
points_per_proc = (size_y + numprocs - 1)/numprocs
! Determine start and end index for this proc's points
istart = proc_num * points_per_proc + 1
iend = min((proc_num + 1)*points_per_proc, size_y)
diff = iend-istart+1
ALLOCATE(proc_contrib(diff))
do i = istart, iend
 proc_contrib(i-istart+1) = -log(sqrt(theta(2))) - 0.5*log(2.0d0*pi) &
 - (1.0d0/(2.0d0*theta(2)))*((y(i)-theta(1))**2)
end do
call MPI_ALLGATHER(proc_contrib, diff, MPI_DOUBLE_PRECISION, &
 lli, diff, MPI_DOUBLE_PRECISION, &
 MPI_COMM_WORLD, ierr)
ll = sum(lli)
end subroutine log_likelihood

$ mpiexec -n 2 ./mle.X 
Fatal error in PMPI_Allgather: Internal MPI error!, error stack:
PMPI_Allgather(961)......: MPI_Allgather(sbuf=0x7fd7478f2860, scount=1500000, MPI_DOUBLE_PRECISION, rbuf=0x7fd747eac650, rcount=1500000, MPI_DOUBLE_PRECISION, MPI_COMM_WORLD) failed
MPIR_Allgather_impl(807).: 
MPIR_Allgather(766)......: 
MPIR_Allgather_intra(560): 
MPIR_Localcopy(357)......: memcpy arguments alias each other, dst=0x7fd747eac650 src=0x7fd7478f2860 len=12000000
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

Hi again,

  I got the code, compiled it and run it. I also changed it a bit. I don't quite understand what is your intention with the broadcast commands. So I changed them a bit as you will see in the following code. I also defined all the variables used. For me it works when compliled with mpich(although I don't know if the result is correct :-) ). I've attached the code, hope it helps. Let me know if it works

 

Attachments: 

AttachmentSize
Downloadapplication/octet-stream log-like.f902.06 KB

Thanks!

I got my code to run, but I guess I'm understanding something wrong about how to make my code run on parallel.

I'm attaching the serial  and the parallel version of my code.

I'm getting different results, and I know the results from the serial version are right. 

Thanks for all the help!

PS: My real code is much more complicated, I'm trying to solve this trivial version first. In my real code proc_contrib(i) = is something that takes a while to evaluate.

Attachments: 

AttachmentSize
Downloadapplication/octet-stream serial-code.tar.gz272.98 KB
Downloadapplication/octet-stream mpi-code.tar.gz343.52 KB

Well yes, if you want it to use it on a real code then you wont get any results in some cases since mpi_allgather implies that data sent = data received from every process. The code will only only give results when mod(size_y,numprocs)=0. To generalize it a bit I propose the following changes(see attached file). I get the same results with the simple example I'm compiling when executing in serial and parallel. Hope it works for you too! Let me know what happens.

Kostas

Attachments: 

AttachmentSize
Downloadapplication/octet-stream log-like.f903.55 KB

Thanks, I'm learning a lot from you today!

My little example is compiling but it crashes at execution.

$ make
mpif90 -g -check all -c mpi_params.f90
mpif90 -g -check all -c maximum_likelihood.f90
mpif90 -g -check all -c main.f90
mpif90 -o mle.X main.o maximum_likelihood.o mpi_params.o
$ mpiexec -n 1 ./mle.X 
forrtl: severe (151): allocatable array is already allocated
Image PC Routine Line Source 
mle.X 00000000004841DE Unknown Unknown Unknown
mle.X 0000000000482C76 Unknown Unknown Unknown
mle.X 0000000000439A72 Unknown Unknown Unknown
mle.X 0000000000422B5B Unknown Unknown Unknown
mle.X 00000000004335C8 Unknown Unknown Unknown
mle.X 000000000040F292 Unknown Unknown Unknown
mle.X 0000000000410E92 Unknown Unknown Unknown
mle.X 0000000000414ADB Unknown Unknown Unknown
mle.X 000000000040762D Unknown Unknown Unknown
mle.X 000000000040755C Unknown Unknown Unknown
libc.so.6 0000003B7C21ECDD Unknown Unknown Unknown
mle.X 0000000000407459 Unknown Unknown Unknown
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 151
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

Thanks again for all your help!

Attachments: 

AttachmentSize
Downloadapplication/octet-stream mpi.tar.gz4.71 KB

Well something is already allocated so you should deallocate it after reusing it but what is it ? You can check your code by adding some "debugging marks" such as :

  print *,  1
  ....[code part where you think there is an error]
  print *, 2
  ... [code part where you think there is an error]
  print *,3 

After the running the code you can find the code portion where the error was generated. Afterwards you can add more prints and continue to bisect the code like that. I think that as soon as you find where the error is generated you would be able to fix it. I believe that the mpi part is correctly executed and the error is in somewhere else that has to do with an allocatable array (hopefully). 

Kostas

Referring to ignacio82's example, I think that the run-time error is caused by the fact that the subroutine

subroutine log_likelihood(y, theta, lli, ll)

tries to allocate the array proc_contrib on each call - therefore the message "allocatable array is already allocated"...

M.

Moreover, I would like to point out a few other "issues":

  1. since the allocatable array y seems to be allocated by the subroutine gen_observations and then not to change size during run-time, the "initialization part" currently present in the log_likelihood subroutine should be definitely moved into some initialization subroutine called just once prior to the calculation itself
  2. why exactly do you need the MPI_BARRIER/MPI_BCAST calls? When your program is executed, then all processes enter the MLE and in turn the log_likelihood subroutine and therefore the call size_y=size(y) is also executed by all processes. Additional broadcast of this information seems thus to be redundant
  3. also, I think that to use MPI_ALLGATHERV in all cases (not only for mod(size_y,numprocs) .NE. 0) will make the code much more readable...

M.

Thank you both for your time!

The reason of MPI_BCAST is because I thought I needed to BCAST theta and y in order of all processors to know their values. The barrier was one of the first suggestions I got when trying to make this code work. Are these unesesaries?

I added the print statements and removed other unnecessaries allocatables from the code.

This is what I get when I run my code

$ mpiexec -n 1 ./mle.X 
 about to allocate proc_contrib
 about to allocate proc_contrib
forrtl: severe (151): allocatable array is already allocated
Image PC Routine Line Source 
mle.X 0000000000483C6E Unknown Unknown Unknown
mle.X 0000000000482706 Unknown Unknown Unknown
mle.X 0000000000439502 Unknown Unknown Unknown
mle.X 000000000041BB0B Unknown Unknown Unknown
mle.X 000000000042C578 Unknown Unknown Unknown
mle.X 000000000040EFE9 Unknown Unknown Unknown
mle.X 0000000000410C01 Unknown Unknown Unknown
mle.X 000000000041307E Unknown Unknown Unknown
mle.X 000000000040772D Unknown Unknown Unknown
mle.X 000000000040765C Unknown Unknown Unknown
libc.so.6 0000003B7C21ECDD Unknown Unknown Unknown
mle.X 0000000000407559 Unknown Unknown Unknown
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 151
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

To try to solve this I created a new subroutine "init_mpi_params" which I call only once at the beginning of the subroutine "MLE".

This is the new error that I'm getting now

$ mpiexec -n 1 ./mle.X 
 about to allocate proc_contrib
 proc_contrib allocated
forrtl: warning (402): fort: (1): In call to INVERSE, an array temporary was created for argument #1
================================================================================
 Param Param Value Gradient
 1 3.1000 NaN
 2 2.1000 NaN
--------------------------------------------------------------------------------
 about to enter the main loop
forrtl: warning (402): fort: (1): In call to INVERSE, an array temporary was created for argument #1
forrtl: severe (193): Run-Time Check Failure. The variable 'maxlikelihood_mp_mle_$RET' is being used without being defined
Image PC Routine Line Source 
mle.X 000000000048406E Unknown Unknown Unknown
mle.X 0000000000482B06 Unknown Unknown Unknown
mle.X 0000000000439902 Unknown Unknown Unknown
mle.X 000000000041BF0B Unknown Unknown Unknown
mle.X 0000000000418889 Unknown Unknown Unknown
mle.X 000000000040772D Unknown Unknown Unknown
mle.X 000000000040765C Unknown Unknown Unknown
libc.so.6 0000003B7C21ECDD Unknown Unknown Unknown
mle.X 0000000000407559 Unknown Unknown Unknown
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 193
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

I have a couple of questions.

1. How do I fix this?

2. m.sulc I would really like to see how you would change the code using MPI_ALLGATHERV to make it more readable

Thanks again!

Attachments: 

AttachmentSize
Downloadapplication/octet-stream mpi.tar.gz4.93 KB

Yes I agree, if they arrays initialized by every process and don't change during execution, you don't need to broadcast them. The barriers are not required either. If the log_likelihood subroutine is called with its intent(in) allocatable arrays already allocated then if you don't deallocate the arrays that are already allocated log_likelihood will produce the already allocated error. The error you get now means that variable RET is not initialized, and you use it to some calculation if you initialize it it will work.

Kostas

I got the code to work! 

Thanks a lot for all your help!

m.sulc I would really like to see how you would change the code using MPI_ALLGATHERV to make it more readable

Thanks again!

Attachments: 

AttachmentSize
Downloadapplication/octet-stream bhhh.tar.gz4.82 KB

Leave a Comment

Please sign in to add a comment. Not a member? Join today