Intel® Fortran Compiler

The Reduction clause stack overflow problem in solving a hybrid MPI/OpenMP application

I have found that there is an essential limitation in the use the Reduction clause in solving a hybrid MPI/OpenMP problem: if the dimension of an array in the clause exceeds the default stack size for threads 4Mb, the stack overflow takes place. The use of KMP_SET_STACKSIZE routine to increase the stack size to 100Mb does not resolve the problem.
Below the source of the FORTRAN test_mpi_openmp code is given:
PROGRAM TEST_MPI_OPENMP
USE IFPORT
IMPLICIT NONE
INCLUDE 'mpif.h'
#ifdef _OPENMP
INCLUDE 'omp_lib.h'
#endif
INTEGER :: I, NTHR, RANK, SIZE, IERR, TID, STACKSIZE, provided
INTEGER, PARAMETER :: NDIMX=200, NDIMY=200
REAL(8) :: A0(NDIMX,NDIMY)
LOGICAL :: SUCCESS
call MPI_INIT_THREAD(MPI_THREAD_MULTIPLE, provided, ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,SIZE,IERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,RANK,IERR)
if ( rank == 0 ) then
print 1000, size
1000 format(1x,'size=',i3)
if(provided == MPI_THREAD_MULTIPLE )
+ print *,'MPI_THREAD_MULTIPLE is provided'
print 1001,provided
1001 format(1x,'provided=',i5)
endif
#ifdef _OPENMP
CALL KMP_SET_STACKSIZE(104857600)
#endif
#ifdef _OPENMP
!$omp parallel
NTHR = OMP_GET_NUM_THREADS()
SUCCESS=SETENVQQ('I_MPI_PIN_DOMAIN=omp')
STACKSIZE=KMP_GET_STACKSIZE()
!$omp end parallel
#endif
IF(RANK.EQ.0) WRITE(*,1002)NTHR
1002 FORMAT(1X,'Number of threads by process=',I2)
IF(RANK.EQ.0) WRITE(*,1003)STACKSIZE
1003 FORMAT(1X,'The stack size by thread, bytes, =',I10)
IF(RANK.EQ.0) WRITE(*,1004)NDIMX,NDIMY
1004 FORMAT(1X,'NDIMX=',I5,' NDIMY=',I5)
A0=0.
!$OMP PARALLEL PRIVATE(TID,NTHR)
TID=OMP_GET_THREAD_NUM()
WRITE(*,'("MPI rank=",I3," Thread number=",I3)') RANK, TID
IF(TID.EQ.0) THEN
NTHR = OMP_GET_NUM_THREADS()
WRITE(*,'("MPI rank=",I3," Number of threads by process= ",I3)')
+ RANK, NTHR
ENDIF
!$OMP DO SCHEDULE(AUTO)
!$OMP+REDUCTION(+:A0)
!$OMP+PRIVATE(I)
DO I=1,10
A0=A0+1.
ENDDO
!$OMP END DO
!$OMP END PARALLEL
WRITE(*,'("MPI rank=",I3," SUM(A0)= ",PE16.8)') RANK, SUM(A0)
CALL MPI_FINALIZE(IERR)
END PROGRAM TEST_MPI_OPENMP

The code was compiled under Windows 10, installed on Intel i7-4960X based (6 cores, 12 threads) PC, by Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.0.109. Intel(R) MPI Library 2017 for Windows was used. The following makefile was applied:

ALL: test_mpi_openmp.exe
test_mpi_openmp.obj: test_mpi_openmp.for
mpiifort @fast_Intel.txt test_mpi_openmp.for \
>test_mpi_openmp.err 2>&1
#
test_mpi_openmp.exe: test_mpi_openmp.obj
mpiifort test_mpi_openmp.obj /link -I_MPL_LINK=opt_mt \
/out:test_mpi_openmp.exe >test_mpi_openmp_link.err 2>&1

with options defined in fast_Intel.txt file:
/c /O3 /Qprec-div- /QxHost /Qopenmp /fpp /Qopenmp-lib:compat

If the REAL(8) array A0(NDIMX,NDIMY) in the code is defined with parameters NDIMX=200, NDIMY=200 (total amount of memory used is 320000 bytes < 4Mb) the following listing is generated after entering:
mpiexec -n 4 -localonly test_mpi_openmp.exe > test_mpi_openmp.lst 2>&1

size= 4
MPI_THREAD_MULTIPLE is provided
provided= 3
Number of threads by process= 3
The stack size by thread, bytes, = 104857600
NDIMX= 200 NDIMY= 200
MPI rank= 0 Thread number= 0
MPI rank= 0 Thread number= 2
MPI rank= 0 Thread number= 1
MPI rank= 0 Number of threads by process= 3
MPI rank= 1 Thread number= 2
MPI rank= 1 Thread number= 0
MPI rank= 1 Thread number= 1
MPI rank= 1 Number of threads by process= 3
MPI rank= 3 Thread number= 0
MPI rank= 3 Thread number= 2
MPI rank= 3 Thread number= 1
MPI rank= 3 Number of threads by process= 3
MPI rank= 2 Thread number= 0
MPI rank= 2 Thread number= 2
MPI rank= 2 Thread number= 1
MPI rank= 2 Number of threads by process= 3
MPI rank= 0 SUM(A0)= 4.00000000E+05
MPI rank= 2 SUM(A0)= 4.00000000E+05
MPI rank= 3 SUM(A0)= 4.00000000E+05
MPI rank= 1 SUM(A0)= 4.00000000E+05

If array A0 is defined with parameters NDIMX=300, NDIMY=300 (total amount of memory used is 720000 bytes > 4Mb) the stack overflow takes place.
Please, help me to resolve the problem.

IMSL installation

I have Intel Parallel Studio 2015 (via floating license) installed on my desktop, and recently purchased an IMSL add-on license. But the IMSL installation failed, with the error message of "the license file provided is invalid". 

Any thoughts of what might go wrong?

on a coarray example, mpiexec seems to freeze when run with -localonly

Hello,

I am really new to coarray. I am trying a trying a simple example:

program ex
    print *, this_image(), num_images()
end program

I compile with ifort /Qcoarray ex.f90. Then, when I simply run ex.exe in the console, nothing happens, the programs just hangs. If, however, I run mpiexec -n 1 ex, then it runs, and it prints 4 lines.

Q about debugging hangup

I recently am dubbing a program where it comes acros a read error.

So the breakpoint occurs in a library routine.

The output pane gives a traceback where the READ error occurred, but

I am wondering why I cannot go to the routine and examine the contents of the variables in that routine.

The stack window does not allow me to go to the routine that has the READ statement.

Is there a way around this, other that having to insert print statements there?

Intel Fortran and MKL: How to include lapack.f90 and mkl.fi

I need to use a subroutine from MKL LAPACK in my program. I was reading that, in orden to use that subroutine in my program, I need to include lapack.f90 and mkl.fi in my program an then use -mkl option when I compile it. But this is not quite easy thing to do it (for me). Because I read that those files are precompiled and I need to compile it, and this is the question: how I compile those files? and how to use those files in my program?.

problem with optimization flags

I am encountering a strange problem with the fortran intel 17.0.0 compiler (as obtained through “ifort –v”) when compiling with optimization flags with arrays.
After a series of tests I created the following very simple code to test what is happening.

 program matrix_define

     implicit none

     integer, parameter :: dimen = 3
     real*8 :: A(2, dimen)
     real*8 :: x(3,dimen)

     x(1,1) = 5.d0
     x(1,2) = 2.d0
     x(1,3) = 1.d0

     x(2,1) = 1.d0
     x(2,2) = 2.d0
     x(2,3) = 3.d0

LoadLibraryA error 193

Hello 

I am trying to run an UMAT on but it gives the following error...I am attaaching screenshot of the error along with the details...I am using VS2013 64 bit version along with parallel studio 2017 and abaqus 6.14..

It says Unexpected LoadLibraryA error 193..The executable standard.exe exited with error code 1073741819 in the command prompt...

订阅 Intel® Fortran Compiler