Using poniters whith “into” in Intel Xeon Phi offload directive

Using poniters whith “into” in Intel Xeon Phi offload directive

According to the book " Intel Xeon Phi Coprocessor High-Performance Programming", we can move data from one variable to another. I tried to follow the example and I found it worked:

Code:

program example 
real , target :: a(5),b(10)

a(1)=1
a(2)=2
a(3)=3
a(4)=4
a(5)=5

print *,'*************************'
print *,'a:'
print *, a


!dir$ offload begin target (mic:0) in(a(1:5): into(b(1:5)) alloc_if(.true.) free_if(.false.) )
print *, 'b on the phi'
print *, b(1:5)
b=b+10
!dir$ end offload

!dir$offload_transfer target(mic:0) out(b(1:5) : into(a(1:5)) alloc_if(.false.))


print *,'*************************'
print *,'a:'
print *, a
end program example

I have an array A on the host and I copy them into an array B which is on the Xeon Phi. I add 10 to all elements in the B and then offload elements in the B on the Xeon Phi to the A on the host. the result is:

However if I use pointers, then there would be an error.

Code 2:

program example 
real , target :: a(5),b(10)
real , pointer :: a_p(:),b_p(:)

a(1)=1
a(2)=2
a(3)=3
a(4)=4
a(5)=5

a_p=>a
b_p=>b
print *,'*************************'
print *,'a:'
print *, a


!dir$ offload begin target (mic:0) in(a_p(1:5): into(b_p(1:5)) alloc_if(.true.) free_if(.false.) )
print *, 'b on the phi'
print *, b_p(1:5)
b_p=b_p+10
!dir$ end offload

!dir$offload_transfer target(mic:0) out(b_p(1:5) : into(a_p(1:5)) alloc_if(.false.))


print *,'*************************'
print *,'a:'
print *, a
end program example

result 2: 

Looks like something is wrong when I try to copy things back.

Does the into support pointers? We'll need pointers to arrays in real project.

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Sorry for the format, here is the new one:

According to the book " Intel Xeon Phi Coprocessor High-Performance Programming", we can move data from one variable to another. I tried to follow the example and I found it worked:

Code:

program example 
real , target :: a(5),b(10)

a(1)=1
a(2)=2
a(3)=3
a(4)=4
a(5)=5

print *,'*************************'
print *,'a:'
print *, a

!dir$ offload begin target (mic:0) in(a(1:5): into(b(1:5)) alloc_if(.true.) free_if(.false.) )
print *, 'b on the phi'
print *, b(1:5)
b=b+10
!dir$ end offload

!dir$offload_transfer target(mic:0) out(b(1:5) : into(a(1:5)) alloc_if(.false.))

print *,'*************************'
print *,'a:'
print *, a
end program example

I have an array A on the host and I copy them into an array B which is on the Xeon Phi. I add 10 to all elements in the B and then offload elements in the B on the Xeon Phi to the A on the host. the result is:

However if I use pointers, then there would be an error.

Code 2:

program example 
real , target :: a(5),b(10)
real , pointer :: a_p(:),b_p(:)

a(1)=1
a(2)=2
a(3)=3
a(4)=4
a(5)=5

a_p=>a
b_p=>b
print *,'*************************'
print *,'a:'
print *, a

!dir$ offload begin target (mic:0) in(a_p(1:5): into(b_p(1:5)) alloc_if(.true.) free_if(.false.) )
print *, 'b on the phi'
print *, b_p(1:5)
b_p=b_p+10
!dir$ end offload

!dir$offload_transfer target(mic:0) out(b_p(1:5) : into(a_p(1:5)) alloc_if(.false.))

print *,'*************************'
print *,'a:'
print *, a
end program example

result 2: 

Looks like something is wrong when I try to copy things back.

Does the into support pointers? We'll need pointers to arrays in real project.

 

I believe there may be a method to accomplish this. I'm double checking with Development about a possible solution that I created.

Can the arrays be allocatable?

Quote:

Kevin Davis (Intel) wrote:

I believe there may be a method to accomplish this. I'm double checking with Development about a possible solution that I created.

Can the arrays be allocatable?

Yes, I've tried them as allocatable. The allocatable array can work but the pointers to the allocatable array still can not work. I also found that you can allocate your allocatable array on the host for size of one while it's brother on the Phi can still have whatever the size you specified in the into directive.

Thank you.

Here's an example of how to use pointers to allocatable arrays.

program example

	real , allocatable, dimension(:),target :: a,b

	real , pointer :: a_p(:),b_p(:)
allocate(a(5))

	allocate(b(10))
a(1)=1

	a(2)=2

	a(3)=3

	a(4)=4

	a(5)=5

	b=0
a_p=>a

	b_p=>b
print *,'*************************'

	print *,'a:'

	print *, a
! Allocate pointer and memory on coprocessor

	!DIR$ OFFLOAD_transfer target(mic:0) in( b_p : length(10) free_if(.FALSE.) )
! Transfer a_p into (part of) b_p and only modify some values

	!DIR$ OFFLOAD begin target(mic:0) in( a_p : length(5) into(b_p(1:5)) free_if(.FALSE.))

	   print *, 'b on the phi'

	   print *, b_p(1:5)

	   !b_p=b_p+10
   ! Update only some uploaded values

	   b_p(3:5)=b_p(3:5)+10

	!dir$ end offload
! Zero a on CPU to demonstrate transfers above worked

	a=0
!DIR$ OFFLOAD_transfer target(mic:0) out( b_p : length(5) into(a_p(1:5)) alloc_if(.false.) free_if(.FALSE.) )
print *,'*************************'

	print *,'a:'

	print *, a

	end program example

 
$ ifort -V

	Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.106 Build 20131008
$ ifort example.F90

	$ ./a.out

	 *************************

	 a:

	   1.000000       2.000000       3.000000       4.000000       5.000000

	 b on the phi

	   1.000000       2.000000       3.000000       4.000000       5.000000

	 *************************

	 a:

	   1.000000       2.000000       13.00000       14.00000       15.00000

I apologize for the lousy looking post. I will try to correct it. Once again we've made forum changes and now methods I used before for posting code/text are no longer working.

Quote:

Kevin Davis (Intel) wrote:

Here's an example of how to use pointers to allocatable arrays.

 

program example

	real , allocatable, dimension(:),target :: a,b

	real , pointer :: a_p(:),b_p(:)
allocate(a(5))

	allocate(b(10))
a(1)=1

	a(2)=2

	a(3)=3

	a(4)=4

	a(5)=5

	b=0
a_p=>a

	b_p=>b
print *,'*************************'

	print *,'a:'

	print *, a
! Allocate pointer and memory on coprocessor

	!DIR$ OFFLOAD_transfer target(mic:0) in( b_p : length(10) free_if(.FALSE.) )
! Transfer a_p into (part of) b_p and only modify some values

	!DIR$ OFFLOAD begin target(mic:0) in( a_p : length(5) into(b_p(1:5)) free_if(.FALSE.))

	   print *, 'b on the phi'

	   print *, b_p(1:5)

	   !b_p=b_p+10
   ! Update only some uploaded values

	   b_p(3:5)=b_p(3:5)+10

	!dir$ end offload
! Zero a on CPU to demonstrate transfers above worked

	a=0
!DIR$ OFFLOAD_transfer target(mic:0) out( b_p : length(5) into(a_p(1:5)) alloc_if(.false.) free_if(.FALSE.) )
print *,'*************************'

	print *,'a:'

	print *, a

	end program example

 

 

$ ifort -V

	Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.106 Build 20131008
$ ifort example.F90

	$ ./a.out

	 *************************

	 a:

	   1.000000       2.000000       3.000000       4.000000       5.000000

	 b on the phi

	   1.000000       2.000000       3.000000       4.000000       5.000000

	 *************************

	 a:

	   1.000000       2.000000       13.00000       14.00000       15.00000

 

 

 

Thank you! This can work!

However, this still cannot fully solve my problem. I'll explain to you why I tried to use "into" at the first place.

I'm working on a project which has been mostly finished. What I suppose to do is make a subroutine which can offload part of our data to the Phi and do some calculation there. Consider out situation that all the data need to be calculated has already been allocated in former stage, the best way to do it is to have a derived type which has pointers pointing to the data in the memory.

Another the thing we need to consider is data duplication. We don't want to allocate new memory on the host and do nothing just inorder to have arrays be allocated on the Phi. After doing some experiment I came up with an idea:

program into
real,allocatable,target:: a(:), b(:), c(:)
integer :: m,i,j
real,pointer :: a_p(:),b_p(:),c_p(:)
m=10

allocate(a(m))
allocate(b(m))
allocate(c(1))
!a_p => a
!b_p => b
!c_p => c
do i=1, 10
a(i)=i
b(i)=40+i
end do
print *,'***********************************************'
print *,'a:'
print *,a 
print *,'***********************************************'
print *,'b:'
print *,b 
print *,'***********************************************'
print *, ' Start offload'
!dir$ offload_transfer target(mic:0) in(b(1:m): into(c(m+1:2*m) )  alloc_if(.true.) free_if(.false.))
!dir$ offload begin target(mic:0) in(a(1:m): into(c(1:m)) alloc_if(.true.) free_if(.false.))  
print *, 'C on the Phi'
call calc(c(1:m),c(m+1:2*m),m)

print *,'***********************************************'
print *, c(1:20)
print *,'***********************************************'
!dir$ end offload

!dir$ offload_transfer target(mic:0) out(c(1:m) : into(b(1:m)) alloc_if(.false.) free_if(.false.))
!dir$ offload_transfer target(mic:0) out(c(m+1:2*m) : into(a(1:m)) alloc_if(.false.) free_if(.false.))

print *, ' End offload'
print *,'***********************************************'
print *,'a:'
print *,a 
print *,'***********************************************'
print *,'b:'
print *,b
print *,'***********************************************'

contains
!dir$ attributes offload : mic :: calc
subroutine calc(a,b,m)
integer :: m
real :: a(m),b(m)
a=a+10
b=b*10
end subroutine calc
end program into
                                                                                                                                                                 1,1           All

 

  the result is:

 ***********************************************
 a:
   1.000000       2.000000       3.000000       4.000000       5.000000    
   6.000000       7.000000       8.000000       9.000000       10.00000    
 ***********************************************
 b:
   41.00000       42.00000       43.00000       44.00000       45.00000    
   46.00000       47.00000       48.00000       49.00000       50.00000    
 ***********************************************
  Start offload
 C on the Phi
 ***********************************************
   11.00000       12.00000       13.00000       14.00000       15.00000    
   16.00000       17.00000       18.00000       19.00000       20.00000    
   410.0000       420.0000       430.0000       440.0000       450.0000    
   460.0000       470.0000       480.0000       490.0000       500.0000    
 ***********************************************
  End offload
 ***********************************************
 a:
   410.0000       420.0000       430.0000       440.0000       450.0000    
   460.0000       470.0000       480.0000       490.0000       500.0000    
 ***********************************************
 b:
   11.00000       12.00000       13.00000       14.00000       15.00000    
   16.00000       17.00000       18.00000       19.00000       20.00000    
 ***********************************************

First, the C array on the host doesn't asked for much space, just 1. Second we can offload two arrays into a bigger array on the Phi, and do some calculation there. At last we can copy things back. This solved thi data duplication problem and gave us a way to have complicated data structure on the Phi. Imagine we have 10 instance of the same problem need to be calculated, instead of doing 10 offload, we can offload 10 array to a bigger array on the Phi.

However, if I change the code to pointers:

program into
real,allocatable,target:: a(:), b(:), c(:)
integer :: m,i,j
real,pointer :: a_p(:),b_p(:),c_p(:)
m=10

allocate(a(m))
allocate(b(m))
allocate(c(1))
a_p => a
b_p => b
c_p => c
do i=1, 10
a(i)=i
b(i)=40+i
end do
print *,'***********************************************'
print *,'a:'
print *,a 
print *,'***********************************************'
print *,'b:'
print *,b 
print *,'***********************************************'
print *, ' Start offload'
!dir$ offload_transfer target(mic:0) in(b_p(1:m): into(c_p(m+1:2*m) )  alloc_if(.true.) free_if(.false.))
!dir$ offload begin target(mic:0) in(a_p(1:m): into(c_p(1:m)) alloc_if(.true.) free_if(.false.))  
print *, 'C on the Phi'
call calc(c_p(1:m),c_p(m+1:2*m),m)

print *,'***********************************************'
print *, c_p(1:20)
print *,'***********************************************'
!dir$ end offload

!dir$ offload_transfer target(mic:0) out(c_p(1:m) : into(b_p(1:m)) alloc_if(.false.) free_if(.false.))
!dir$ offload_transfer target(mic:0) out(c_p(m+1:2*m) : into(a_p(1:m)) alloc_if(.false.) free_if(.false.))

print *, ' End offload'
print *,'***********************************************'
print *,'a:'
print *,a 
print *,'***********************************************'
print *,'b:'
print *,b
print *,'***********************************************'

contains
!dir$ attributes offload : mic :: calc
subroutine calc(a,b,m)
integer :: m
real :: a(m),b(m)
a=a+10
b=b*10
end subroutine calc
end program into

The result will be:

 ***********************************************
 a:
   1.000000       2.000000       3.000000       4.000000       5.000000    
   6.000000       7.000000       8.000000       9.000000       10.00000    
 ***********************************************
 b:
   41.00000       42.00000       43.00000       44.00000       45.00000    
   46.00000       47.00000       48.00000       49.00000       50.00000    
 ***********************************************
  Start offload
 C on the Phi
 ***********************************************
   11.00000       12.00000       13.00000       14.00000       15.00000    
   16.00000       17.00000       18.00000       19.00000       20.00000    
   410.0000       420.0000       430.0000       440.0000       450.0000    
   460.0000       470.0000       480.0000       490.0000       500.0000    
 ***********************************************
  End offload
 ***********************************************
 a:
   410.0000       420.0000       430.0000       440.0000       450.0000    
   460.0000       470.0000       480.0000       490.0000       500.0000    
 ***********************************************
 b:
  2.9426954E-38   12.00000       13.00000       14.00000       15.00000    
   16.00000       17.00000       18.00000       19.00000       20.00000    
 ***********************************************

You can find that the random number in c(1) on the host will be copyed to the b(1)

 

Any solutions?

Thank you for your patience and I really appriciate your help

In your program into pointer version, change the second to last offload_transfer from this:

!dir$ offload_transfer target(mic:0) out(c_p(1:m) : into(b_p(1:m)) alloc_if(.false.) free_if(.false.))

to this:

!dir$ offload_transfer target(mic:0) out(c_p : length(m) into(b_p(1:m)) alloc_if(.false.) free_if(.false.))

I'm still discussing w/Development whether your original statement with c_p(1:m) exposes a defect, I think it does.

Also, I believe the user's method posted here which avoids INTO might be useful for your needs too. It avoids the additional CPU allocation for the pointer (array c in your case).

You can disregard my suggested change. Despite the apparent success/correct results, Development reaffirmed you currently cannot allocate more on the coprocessor than on the CPU. There is an active feature request to support what you coded for c and c_p; however, currently depending on the memory layout in a different or larger application, it is probable additional offloads will produce unpredictable results.

I will keep this thread updated on the status of that request (internal tracking id noted below).

(Internal tracking id: DPD200245090 - Offload "in( a(n) : into b)" clause should not need b to be allocated of size n)

Leave a Comment

Please sign in to add a comment. Not a member? Join today