nocopy access to arrays fails with multiple MICs

nocopy access to arrays fails with multiple MICs

I am trying to create an array that persists on multiple (2+) MICs and can be accessed by passing nocopy to the pragma. I have this working absolutely fine for a single MIC. The example below creates a small array, offloads it to a number of MICs, then tries to access the arrays on the MICs. I access them in reverse order to demonstrate that I can access the array on the last MIC which it was transferred to. For example, if I ofload to mic:0 , then to mic:1, I can only access the array on mic:1, a call to mic:0 fails with the error: offload error: process on the device 0 was terminated by signal 11

Again, this works fine for a single MIC. The pastebin code is here: http://pastebin.com/CFDmJdHj

int num_devices = _Offload_number_of_devices();
int NE = 8;
int i, j;

// array to be offloaded to the mics
__declspec(target (mic)) float *offarray = (float*) memalign( 4096, NE * sizeof(float) ); for ( j = 0; j < NE; j++ ) offarray[j] = sqrt(j);

// offloat to all (both) mics and retain memory
for ( i = 0; i < num_devices; i++ )
{
    #pragma offload_transfer target(mic:i) \
    in ( offarray : length ( NE ) alloc_if(1) free_if(0) )
}

// access array on mics in reverse order
for ( i = num_devices - 1; i >= 0; i-- )
{
    #pragma offload target(mic:i) nocopy ( offarray )
    {
        int j;
        for ( j = 0; j < NE; j++ ) printf ( "%d(%d) %f\n", j, i, offarray[j] );
    }
}

The code was compiled with:

icc -vec-report=3 -O3 -offload-build -offload-attribute-target=mic phi_test.c -o phi_test

Thanks in advance!

13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Update:

I have acheived the same desired result using:

#pragma offload target(mic:i) in ( offarray : length ( 0 ) alloc_if(0) free_if(0) )

to re-access the memory space. I don't know enough about the MICs to know which is better or why though.

nocopy means do not send anything including the pointer to the memory which was allocated earilier in a previous offload.

lenght(0) means do not update the values in the memory but send the pointer to the allocate memory so it can be used in the offload.

Great, well that is what I am using now. Though it does not quite explain why copy actually did work for one MIC.

Thanks Ravi.

How did you confirm the copy occurred and on which card?

The only confirmation I had (in the code example I gave) was that when it loops over the MICs it prints out the array values from each MIC, and one (the last one copied) appeared to be there.

This failure is reproducible. Nocopy should work. There is no relation to traversing the devices in reverse order in the second loop. Something goes awry with the pointer data allocation on device 0 that affects the offload to that device in the second loop. The same error ( offload error: process on the device 0 was terminated by signal 11 (SIGSEGV) ) occurs when the second loop traverses the devices in increasing order.

Just to confirm, here is the error for the test case you provided:

$ icc -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.2.183 Build 20130514

$ icc -vec-report=3 -O3 phi_test.c -o phi_test
phi_test.c(14): (col. 9) remark: LOOP WAS VECTORIZED.
phi_test.c(18): (col. 9) remark: loop was not vectorized: unsupported loop structure.
phi_test.c(33): (col. 25) remark: loop was not vectorized: existence of vector dependence.
phi_test.c(24): (col. 9) remark: loop was not vectorized: unsupported loop structure.
phi_test.c(33): (col. 25) remark: *MIC* loop was not vectorized: existence of vector dependence.

$ ./phi_test
offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)
0(1) 0.000000
1(1) 1.000000
2(1) 1.414213
3(1) 1.732051
4(1) 2.000000
5(1) 2.236068
6(1) 2.449490
7(1) 2.645751

I submitted this to our Developers (under the internal tracking id below). For now, please continue using the in w/length(0) as a work around. I will update the thread as I learn more regarding a fix.

(Internal tracking id: DPD200245213)
(Resolution Update on 09/08/2014): This defect is fixed in the Intel® Parallel Studio XE 2015 Initial Release (2015.0.090 - Linux)

Thanks for getting on this so quickly Kevin.

Corey

Hi All.

The length(0) work around works fine for an array that is created then copied across to the Mics. if instead I just want to create a local pointer, then have the memory allocated on the Mic during offload code as per the linked example below, then I do not see a nice way of doing it without using nocopy.

If I use length(1) in the "in" statement then it allocates memory automatically, which is later ignored when I re-reference the pointer to the memory I have allocated within the offloaded code. And using "in ( data_phi : length (1) alloc_if(0) free_if(0) )" fails because of alloc_if(0). Also using "in ( data_phi : length (0) alloc_if(0) free_if(0) )" leads to a crash.

Code example: http://pastebin.com/kpwFmzCu

Obviously a fix of the nocopy issue would solve this problem, though does anyone have a work around at the moment?

Also Kevin, could you please let me know how to insert code into these posts rather than posting pastebin links?

Cheers

Corey

Corey, my apologies. I missed your earlier reply and question regarding posting code. I will look into the question regarding a workaround and post on that later.

For posting code, just below the Comment pane where you type a forum reply/comment, there are some instructions on syntax highlighting:

  • To enable syntax highlighting, surround the language with brackets, where language is one of the following languages: bash, csharp, cpp, css, fortran, jscript, java, perl, php, plain, python, r, ruby, sql, xml, html, javascript, s, splus.

I use the "cpp" tag. Under notepad, I add the "open square-bracket"cpp"close square-bracket" < source code>   "open square-bracket"/cpp"close square-bracket" tags (one of the first line and one on the last line) and then cut-and-paste into the forum comment pane.

Here is an example with the tags shown using parens to avoid interpretation, followed by the code using the required square brackets.

(cpp)int main()
{
foo();
}(/cpp)

int main()
{
foo();
}

Corey, Pardon the delayed reply. The second variant appears to share the same root cause related to improperly handling persistence of the local object.

The only work around I found for the second variant is to use static (and decorate accordingly) for the pointer as in:

        __attribute__ ((target(mic))) static struct data_type *data_phi;

I attached the second variant to the earlier noted internal tracking report and will update when I learn more.

I confirmed both variants from this post are fixed in the next major release planned for later this year and will update this post when the release is officially available.
If you are interested in accessing the fix earlier, the Beta program for the next major release is currently under-way. If you interested in participating in our Beta program, please refer to the invitation posted in our User Forum: Invitation to join the Intel® Software Development Tools 2015 Beta program

The new Intel Parallel Studio XE 2015 release (Version 15.0.0.090 Build 20140723) is now available from the Intel Registration Center.

Leave a Comment

Please sign in to add a comment. Not a member? Join today