OpenMP 4.0, Offload, and Intel Fortran 14.0.1.106

OpenMP 4.0, Offload, and Intel Fortran 14.0.1.106

Bild des Benutzers thematt

All,

I'm trying to port some code from using the Intel Fortran MIC directives to OpenMP 4 and it just hasn't been working for me. So, I decided to step back and try out some of the examples from OpenMP themselves. I figure if I can't figure out and run those, I'm stuck anyway. So, from the examples (PDF) I decided to start with Example 49.5f as it comes closest to sort of looking like real-life code.

So, I transcribed it and added a couple extra routines (init and output):

module utils

	   implicit none

	   contains

	   subroutine init(v1, v2, N)

	      implicit none

	      real, dimension(:) :: v1, v2

	      integer :: N, i
      v1 = 2.0

	      v2 = 4.0

	   end subroutine init
   subroutine output(p, N)

	      implicit none

	      real, dimension(:) :: p

	      integer :: N, i
      write (*,*) "p(1): ", p(1)

	   end subroutine output

	end module utils
module my_mult

	   use utils

	   implicit none

	   contains

	   subroutine foo(p0,v1,v2,N)

	      implicit none

	      real, dimension(:) :: p0, v1, v2

	      integer :: N, i
      call init(v1, v2, N)
      !$omp target data map(to: v1, v2) map(from: p0)

	      call vec_mult(p0,v1,v2,N)

	      !$omp end target data

	      call output(p0, N)

	   end subroutine foo
   subroutine vec_mult(p1,v3,v4,N)

	      implicit none

	      real, dimension(:) :: p1, v3, v4

	      integer :: N, i
      !$omp target map(to: v3, v4) map(from: p1)

	      !$omp parallel do

	      do i = 1, n

	         p1(i) = v3(i) * v4(i)

	      end do

	      !$omp end target

	   end subroutine vec_mult

	end module my_mult
program main

	   use my_mult

	   implicit none
   !integer, parameter :: N = 1024*1024*1024

	   integer, parameter :: N = 1024*1024

	   real, allocatable, dimension(:) :: p, v1, v2
   allocate( p(N), v1(N), v2(N) )

	   call foo(p, v1, v2, N)

	   deallocate( p, v1, v2 )

	end program main

When I run this without OpenMP and it works okay, but when I add in -openmp it stalls out and I have to Ctrl-C

(1002) $ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.106 Build 20131008
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

(1003) $ ifort 49.5f.F90
(1004) $ ./a.out
 p(1):    8.000000    
(1005) $ ifort -openmp 49.5f.F90
(1006) $ ./a.out
[Offload] [MIC 0] [File]            49.5f.F90
[Offload] [MIC 0] [Line]            33
[Offload] [MIC 0] [Tag]             Tag 0
[Offload] [HOST]  [Tag 0] [CPU Time]        0.048183(seconds)
[Offload] [MIC 0] [Tag 0] [MIC Time]        0.000270(seconds)

[Offload] [MIC 0] [File]            49.5f.F90
[Offload] [MIC 0] [Line]            44
[Offload] [MIC 0] [Tag]             Tag 1

Now, my guess is the first set of offload notifications are due to the target data. The second would be the target...and I guess it doesn't work?

I suppose my question now is: did I do something wrong? As I've never actually gotten OpenMP 4 + MIC to work, I don't have a baseline to work from.

Thanks,

Matt

Matt Thompson
12 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

From a quick first test I am seeing the same behavior. Let me investigate and reply again after learning more.

Bild des Benutzers thematt

Glad to hear it's not just me!

And, I suppose, this is fair warning that many an OpenMP 4+MIC question might be incoming. (Preview: is there a way with OpenMP to have a single code that can run on the host or on the MIC given compiler options or preprocessor directives? I'm not sure there is... ETA: Ahhh...the if clause!)

Matt Thompson
Bild des Benutzers thematt

As I wait for Kevin, here's another question. I thought I'd try out this simple code (based on Example 55.2f) on a machine that is Westmeres and nothing else. No GPUs, no MICs, just a plain ol' compute node:

program test
use omp_lib
implicit none
logical :: do_offload

	integer :: num_devices
num_devices = omp_get_num_devices()

	write (*,*) 'num_devices: ', num_devices
do_offload  = num_devices > 0

	write (*,*) 'do_offload: ', do_offload
end program test

My thought was let's test if I can use something like the OMP if clause to ignore target statements on a non-MIC platform. However:

$ ifort test.F90
/gpfsm/dnb31/tdirs/pbs/slurm.434493.mathomp4/ifortYcM7wm.o: In function `MAIN__':
test.F90:(.text+0x3b): undefined reference to `omp_get_num_devices'
$ ifort -openmp test.F90
ifort: warning #10362: Environment configuration problem encountered.  Please check for proper MPSS installation and environment setup.
x86_64-k1om-linux-ld: No such file or directory

Is this expected behaviour? Perhaps it's due to how Intel 14 is loaded on our cluster (via modules and it has some MIC stuff setup in the environment)?

Thanks,

Matt

Matt Thompson

The omp_get_num_devices() invokes the offload compilation so the warning message about MPSS can be expected.

With the Intel offload feature, __MIC__ is defined when the offload compilation occurs. I do not see an equivalent for OpenMP 4.0 offhand. Let me check on this. I'm also looking into your earlier inquiry about features for having single code for host/co-processor. I'm more familiar with our own offload than the new OpenMP 4.0 so hopefully you can bear with me.

Bild des Benutzers thematt

Kevin,

No worries. I'm learning this myself. Your comment gave me a thought:

#ifdef __INTEL_OFFLOAD

	num_devices = omp_get_num_devices()

	#else

	num_devices = 0

	#endif

	write (*,*) 'num_devices: ', num_devices

Now I can have some control over that section with -no-offload:

(mic node) $ ifort -openmp test.F90
(mic node) $ ./a.out
 num_devices:            1
 do_offload:  T
(no mic node) $ ./a.out
 num_devices:            0
 do_offload:  F

(any node) $ ifort -openmp -no-offload test.F90
(mic node) $ ./a.out
 num_devices:            0
 do_offload:  F
(no mic node) $ ./a.out
 num_devices:            0
 do_offload:  F

It's not perfect, but it's a step in the right direction. You are just required to compile on a MIC-enabled node if you think you'll need offloading. If not, -no-offload could allow for more expansive compiling. Kind of what I have to do for CUDA as well, though it's probably time to overload all these preproc macros to __ACCEL__ or the like.

Matt Thompson

For what it's worth, I wrote a series of small test cases (with no use of module) using OpenMP 4 for target offload, both using separate target update directives for data transfer and using target map.  Each of the cases produces correct results in at least one or the other version, but there are some incorrect results, including a case which shouldn't offload as there is an unsatisfied if() on the omp target directive.  I expected performance differences between the two approaches, but didn''t see them.

This was on relatively new hardware, to which I will soon lose access.  I haven't checked omp target on the older hardware (Westmere, KNC B0) to which I expect to retain access.

I'm also in a learning stage not knowing whether I made mistakes or why it doesn't work as I expected.  I was intending to try C after Fortran; maybe I should go ahead when time permits.

Most discussions relating to MIC are undertaken on the MIC specific forum, but I haven't seen any discussions on this subject there.

My test cases with ifort omp target began to work correctly (but not efficiently) with the ifort 15.0 release.  Intel C and C++ still is a problem for me.

I've run lots of tests of OpenMP 4 for host and MIC native, linux and windows, Intel and gnu compilers, examples at

https://github.com/tprince/lcd

and discussion at https://sites.google.com/site/tprincesite/parallel-optimization

>>>My thought was let's test if I can use something like the OMP if clause to ignore target statements on a non-MIC platform.

omp_is_initial_device() should function to have the host execute statements inside !$omp target when there is no attached device.  Unfortunately it's not yet implemented in icc/ifort 15.0.  I tested the following on a non-MIC platform.

 

$ cat get_host_tgt.f90
program hosttarget
use omp_lib, ONLY: omp_is_initial_device
implicit none

!$omp target
   if( omp_is_initial_device() ) then
      print *,'   running on host without attached device'
   else
      print *,'   running on device attached to host'
   endif
!$omp end target

end program hosttarget

$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0 Build 20141028
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

$ ifort -fopenmp get_host_tgt.f90
get_host_tgt.f90(2): error #6580: Name in only-list does not exist.   [OMP_IS_INITIAL_DEVICE]
use omp_lib, ONLY: omp_is_initial_device
-------------------^
get_host_tgt.f90(6): error #6404: This name does not have a type, and must have an explicit type.   [OMP_IS_INITIAL_DEVICE]
   if( omp_is_initial_device() ) then
-------^
get_host_tgt.f90(6): error #6341: A logical data type is required in this context.   [OMP_IS_INITIAL_DEVICE]
   if( omp_is_initial_device() ) then
-------^
compilation aborted for get_host_tgt.f90 (code 1)
$

 

We need to catch up with gcc/gfortran here, and certainly this is needed for OpenMP 4.0 completeness, so I'll report this to the developers.

$ gfortran --version
GNU Fortran (GCC) 4.9.1

$ gfortran -fopenmp get_host_tgt.f90 && ./a.out
    running on host without attached device
$

 

Patrick

>>> I'll report this to the developers

Internal tracking # DPD200362637

Patrick

Thank you for your note Tim. I am delinquent in updating the thread regarding Matt’s original case. My apologies Matt. Matt’s initial case is now working with the newest IPS XE 2015 (15.0) initial release only. We fixed the underlying issue only in this newer release.

omp_is_initial_device() is now implemented in the Composer XE 2015 update 2 compiles, so I am closing this thread now.

The following block will execute on the target device if ONTGT is defined for the compilation; otherwise it executes on the host:

#ifdef __MIC__
       num_thr = omp_get_num_threads()
       whatdev = omp_is_initial_device()
       !$omp single
          print *,' Compiled with OFFLOAD compiler...'
          print *,'    Running on DEVICE with',num_thr,' threads and...'
          print *,'       ...omp_is_initial_device() is ',whatdev
       !$omp end single
#else
       num_thr = omp_get_num_threads()
       whatdev = omp_is_initial_device()
       !$omp single
          print *,' Compiled with OFFLOAD compiler...'
          print *,'    Running on HOST with',num_thr,' threads and...'
          print *,'       ...omp_is_initial_device() is ',whatdev
       !$omp end single
#endif

 

[DPD200362637]$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.2.164 Build 20150121
Copyright (C) 1985-2015 Intel Corporation.  All rights reserved.

[DPD200362637]$ ifort -qopenmp -fpp only_host-dcltgt.f90 -o only_host-dcltgt.f90-ifort.x
[DPD200362637]$ ./only_host-dcltgt.f90-ifort.x
  Compiled with OFFLOAD compiler...
     Running on HOST with          32  threads and...
        ...omp_is_initial_device() is  T

[DPD200362637]$ ifort -qopenmp -fpp only_host-dcltgt.f90 -o only_host-dcltgt.f90-ifort.x -DONTGT
[DPD200362637]$ ./only_host-dcltgt.f90-ifort.x                                 

  Compiled with OFFLOAD compiler...
     Running on DEVICE with         224  threads and...
        ...omp_is_initial_device() is  F
[DPD200362637]$

 

Patrick

Melden Sie sich an, um einen Kommentar zu hinterlassen.