Compiler hangs up when image is used

Compiler hangs up when image is used

Hi, everybody!

I have found 2 problems with OpenCL kernel compilation on Intel HD4600:

1) Kernel Builder can't build some of my kernels and reports this:

OpenCL Intel(R) Graphics device was found!
Device name: Intel(R) HD Graphics 4600
Device version: OpenCL 1.2 
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
fcl build 1 succeeded.

Build failed!

If i reduce kernel code, then build would finish successfully. But the whole kernel can't be build. And my program also can't do this.

At the same time this kernel can be compiled by Kernel Builder. I can't share my kernel code, because it is proprietary. And while trying to write test kernel to reproduce this problem, I found another one:

2) Kernel Builder hangs up while building following kernel:

__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST;

#define ROTR( x, n )    ( ( (x) >> (n) ) | ( (x) << ( 32 - (n) ) ) )
#define SHR( x, n )     ( (x) >> (n) )

#define Ch( x, y, z )   ( ( (x) & (y) ) ^ ( ~(x) & (z) ) )
#define Maj( x, y, z )  ( ( (x) & (y) ) ^ ( (x) & (z) ) ^ ( (y) & (z) ) )
#define SIGMA0( x )     ( ROTR( (x), 2  ) ^ ROTR( (x), 13 ) ^ ROTR( (x), 22 ) )
#define SIGMA1( x )     ( ROTR( (x), 6  ) ^ ROTR( (x), 11 ) ^ ROTR( (x), 25 ) )
#define sigma0( x )     ( ROTR( (x), 7  ) ^ ROTR( (x), 18 ) ^  SHR( (x),  3 ) )
#define sigma1( x )     ( ROTR( (x), 17 ) ^ ROTR( (x), 19 ) ^  SHR( (x), 10 ) )

#define ROUND( A, B, C, D, E, F, G, H, W, k )     {         \
    (H) += SIGMA1( (E) ) + Ch( (E), (F), (G) ) + (k) + (W); \
    (D) += (H);                                             \
    (H) += SIGMA0( (A) ) + Maj( (A), (B), (C) );            }

void test( __read_only image2d_t image, uint word, uint* digest )
{
    uint4 storage = read_imageui( image, sampler, (int2)( get_local_id( 0 ), get_group_id( 0 ) ) );
    uint a = storage.x;
    uint b = storage.y;
    uint c = storage.z;
    uint d = storage.w;
    
    uint e = 5;
    uint f = 6;
    uint g = 7;
    uint h = 8;

    uint w0 = word;
    uint w1 = 1;
    uint w2 = 0;
    uint w3 = 0;
    uint w4 = 0;
    uint w5 = 0;
    uint w6 = 0;
    uint w7 = 0;
    uint w8 = 0;
    uint w9 = 0;
    uint wA = 0;
    uint wB = 0;
    uint wC = 0;
    uint wD = 0;
    uint wE = 0;
    uint wF = 1;

    ROUND( a, b, c, d, e, f, g, h, w0, 1 );
    ROUND( h, a, b, c, d, e, f, g, w1, 2 );
    ROUND( g, h, a, b, c, d, e, f, w2, 3 );
    ROUND( f, g, h, a, b, c, d, e, w3, 4 );
    ROUND( e, f, g, h, a, b, c, d, w4, 5 );
    ROUND( d, e, f, g, h, a, b, c, w5, 6 );
    ROUND( c, d, e, f, g, h, a, b, w6, 7 );
    ROUND( b, c, d, e, f, g, h, a, w7, 8 );
    ROUND( a, b, c, d, e, f, g, h, w8, 9 );
    ROUND( h, a, b, c, d, e, f, g, w9, 10 );
    ROUND( g, h, a, b, c, d, e, f, wA, 11 );
    ROUND( f, g, h, a, b, c, d, e, wB, 12 );
    ROUND( e, f, g, h, a, b, c, d, wC, 13 );
    ROUND( d, e, f, g, h, a, b, c, wD, 14 );
    ROUND( c, d, e, f, g, h, a, b, wE, 15 );
    ROUND( b, c, d, e, f, g, h, a, wF, 16 );

    digest[ 0 ] = 1 + a;
    digest[ 1 ] = 2 + b;
    digest[ 2 ] = 3 + c;
    digest[ 3 ] = 4 + d;
    digest[ 4 ] = 5 + e;
    digest[ 5 ] = 6 + f;
    digest[ 6 ] = 7 + g;
    digest[ 7 ] = 8 + h;
}

__kernel void hangup( __read_only image2d_t image,
    __constant uint* cdata, __global uint* data )
{
    const uint gid = get_global_id( 0 );
    uint digest[ 8 ];
    
    test( image, data[ gid ], digest );
    for( uint index = 0; index < 8; ++index )
    {
        data[ gid ] ^= digest[ index ];
    }
}

And again, this kernel can be compiled by Kernel Builder. I assume hang occurs on link stage.

If I don't use 2D image, then no problems occur while building.

OpenCL driver version is 10.18.10.3652. Any help will be appreciated.

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Thanks for this report.  I've been able to replicate here and the issue has been reported to the development team.  Until a kernel builder fix can be implemented, is it reasonable in your case to switch to buffers?

According to the Optimization Guide:

To improve performance on the Intel Processor Graphics, do the following:

  • Avoid images, except for irregular access patterns. For example, use buffers when processing in memory (in row-major) order.
  • Use buffers for look-up tables
  • Use local memory for explicit caching of buffer values

Regards, Jeff

 

 

Just to clarify, are these issues seen only when using the kernel builder tool or also for regular compilation?

 

Quote:

Jeffrey Mcallister (Intel) wrote:

Until a kernel builder fix can be implemented, is it reasonable in your case to switch to buffers?

To improve performance on the Intel Processor Graphics, do the following:

  • Avoid images, except for irregular access patterns. For example, use buffers when processing in memory (in row-major) order.
  • Use buffers for look-up tables
  • Use local memory for explicit caching of buffer values

Unfortunately, most of our kernels use images to get benefit from texture (image) cache. It is better to use images rather then global buffers on discrete GPUs, such as Nvidia and AMD GPUs.

Is it not true for Intel GPUs, because they have no texture cache? If it so, then we should write kernels for Intel GPUs from scratch...

Also, I have found, that hang up occurs when building for HD4600, which has OpenCL 1.2 version. And no hang up occurs when building the same kernel for HD4000 (OpenCL 1.1).

Quote:

Jeffrey Mcallister (Intel) wrote:

Just to clarify, are these issues seen only when using the kernel builder tool or also for regular compilation?

Both Kernel Builder (32/64 bit) and our code with regular compilation via OpenCL.dll have this issue with hang up.

Leave a Comment

Please sign in to add a comment. Not a member? Join today