Restrictions on Offloaded Code for Intel® Graphics Technology

This topic only applies to Intel® 64 and IA-32 architectures targeting Intel® Graphics Technology.

Offloaded code has the following restrictions:

  • You can place #pragma offload target(gfx) only before a perfect loop nest explicitly written as a _Cilk_for loop or using an Intel® Cilk™ Plus array notation statement.

  • Do not use the __GFX__ macro inside the statement following a #pragma offload statement. You can, however, use this macro in a subprogram called from the pragma.

  • All macros inside a #pragma offload region must evaluate to code using the same set of variables. For example, the following code leads to undefined behavior, because N, inputArray and outputArray are not used in the CPU version of the code:

    #pragma offload target(gfx) pin(inputArray, outputArray: length(N))
    _Cilk_for (int i = 0; i < N; i++){
    #ifdef __GFX__  
       outputArray[i] = inputArray[i]/N;
    #else
       // nothing
    #endif
    
  • This restriction only applies to versions of Windows* earlier than Windows 8 and Windows Server* 2012. Offload to the target is only possible from a session which has access to the graphics driver. The practical implication is that:

    • Offload is possible:

      • from a local desktop session, when the desktop is not locked and no screen saver is running.

      • from a Remote Desktop connection, when the remote desktop client window is open.

    • Offload is not possible from Session 0 where all services are run on Windows Vista* OS, Windows* 7 OS, Windows Server* 2008, and later versions.

  • By default, the system does not allow for an offload task to execute longer than the system recovery timeout period, which is usually two seconds. In the offload runtime, the task appears to be hanging and abnormally terminating after another timeout, which has a value of 32 seconds. To enable your offload tasks to execute more than 32 seconds, refer to Microsoft's documentation on the Timeout Detection and Recovery (TDR) registry keys to disable recovery on timeout in the system registry.

  • This restriction only applies to versions of Windows* earlier than Windows 8 and Windows Server* 2012. To offload to the target on a machine with a discrete graphics card installed, you need to make the target the primary graphics device.

  • The parallel loops associated with #pragma offload must be perfectly nested. The parallel loops must follow the requirements for _Cilk_for, and the loop counter variable of those loops must be of either type int or unsigned, and the stride must be known at compile time.

  • If your application executes code on the target and host in parallel, you need to ensure that the host and the target do not modify the same cache lines to avoid the false sharing problem. For example, you can pad variables passed to the pin clause to a multiple of the cache line size, or if the host and the target operate on the same array, correspondingly organize the write access in your parallel offload code. Applying __declspec(avoid_false_share) for a variable ensures it is aligned and padded such that it is not subject to false sharing with any other variable.

  • Recursive calls are not supported.

  • The header file math.h expands single precision functions, such as sinf to double precision, and the compiler is not always able to convert them back, which may lead to performance problems or even compiler failures. To work around this problem, use mathimf.h instead.

  • The compiler supports only a subset of math functions, which either map directly to the Intel® Graphics Technology instruction set architecture when possible, or are implemented in the SVML library supplied with the compiler. Only the following functions are supported:

    • acos/acosf

    • acosh/acoshf

    • asin/asinf

    • asinh/asinhf

    • atan/atanf

    • atanh/atanhf

    • cbrt/cbrtf

    • sqrt/sqrtf

    • ceil/ceilf

    • cos/cosf

    • erf/erff

    • erfc/erfcf

    • exp/expf

    • exp10/exp10f

    • exp2/exp2f

    • expm1/expm1f

    • fabs/fabsf

    • floor/floorf

    • invsqrt/invsqrtf

    • log/logf

    • log10/log10f

    • log1p/log1pf

    • log2/log2f

    • nearbyint/nearbyintf

    • rint/rintf

    • round/roundf

    • sin/sinf

    • sinh/sinhf

    • tan/tanf

    • tanh/tanhf

    • trunc/truncf

    • copysign/copysignf

    • atan2/atan2f

    • fmax/fmaxf

    • fmin/fminf

    • hypot/hypotf

    • pow/powf

  • Double precision division is also supported and is translated to a call to an SVML function.

  • The compiler does not support Variable Length Array allocation in heterogeneous code. For example, instead of using float myArray[variableSize];, where variableSize is a variable, use float myArray[CONSTANT_SIZE]; where CONSTANT_SIZE is a compile-time constant.

  • long double operations are not supported in target code.

  • longjump/setjump is not supported in target code.

  • Indirect control flow is not supported in target code, including:

    • Function pointers, taking address of a function, calling a function by pointers

    • Calls to virtual function

    Switch statements are supported in target code.

  • Exceptions are not allowed in target code.

  • RTTI is not supported in target code.

  • Functions with variable number of arguments (…) are not supported in target code.

Restrictions on Pointers

  • Sharing or copying of pointers between the host and the target is not supported. Pointers have different meanings on the target and the host, so a pointer value valid for the host is meaningless on the target. No auto-translation of pointer values are done.

  • Offloaded code cannot use arrays of pointer-typed elements, pointers to pointers, or pointer-typed members of structures or classes.

  • Pointer or reference typed arguments to a target(gfx) vector function must be either linear or uniform, and vector functions cannot return pointers or reference typed values.

  • Global or static variables cannot be of pointer or reference types.

  • Conversion between pointer types and non-pointer types is not allowed.

  • Taking the address of a pointer or reference is not allowed.

Restrictions on Offload Language Extensions

  • The following pragmas are not supported:

    • offload_transfer

    • offload_wait

  • The following specifiers of pragma offload are not supported:

    • signal

    • wait

    • mandatory

  • The following modifiers of pragma offload parameters are not supported:

    • alloc_if

    • free_if

    • alloc

    • into

  • target-number in #pragma offload target (target-name [ :target-number ]) is ignored

  • Local scalar variables can only be passed in the in clause of #pragma offload.

    Note

    Adding local scalar variables to the in clause is redundant and can be omitted, as the compiler automatically adds variables used in the lexical scope of a #pragma offloadstatement to the in clause. Scalar local variables are passed by value and any updates to the variable inside the target code are not visible on the host side after offload.

    For example, the following code prints var = 55, i = 0.

    int var = 55;
           
    int i = 0;
    #pragma offload target(gfx) 
    
    _Cilk_for (i = 0; i < 1; i++)
    {
       ++var;
    }
    
    printf("var = %d, i = %d\n", var, i);
    

    The following code results in a compile-time error because local variable var can only be listed in the in clause

     int var = 55;
            int i = 0;
     #pragma offload target(gfx) inout(var)
     …
    
  • Global or static variables cannot be listed in the pin clause of #pragma offload.

Restrictions on Using OpenMP*

The processor graphics does not have OpenMP* run-time library routines. Parallelization happens on the host side. So you cannot call the runtime APIs to change behavior, such as task scheduling, for the target side.

For more complete information about compiler optimizations, see our Optimization Notice.