nan output values when using FP16

nan output values when using FP16

I have following custom kernel function.

kernel void fooBar
(
    const __global float* input,
    int     num_dims_in,
    const __global int* dims_in,
    const __global int* stride_in,
    float     offset_x,
    __global float* output,
    int     num_dims_out,
    const __global int* dims_out,
    const __global int* stride_out
)
{
  int w = dims_in[0];
  int h = dims_in[1];
  int x = get_global_id(0);
  int y = get_global_id(1);
  int z = get_global_id(2);
  int index = x + y * w + z * w * h;

  output[index] = (input[index] > 0) ? input[index] : input[index] * offset_x ; //Here

  return ;
} 

This program is working fine with optimized FP32 model format.

But when I am converting the model into FP16 floating point format (using MO) I am getting all my final layer output as nan values.

I am thinking the error is coming from my kernel function given above.

If I am writing

output[index] = (input[index] > 0) ? input[index] : input[index] * 0;

instead of

output[index] = (input[index] > 0) ? input[index] : input[index] * offset_x ;

I am getting non nan values.

Can somebody help me with this ?

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi,

I was able to fix this by using vstore_half() and vload_half(). Also by using data type half instead of float.

Earlier multiplication by offset_x was converting the data into 32 bit floating point format.

 

Regards,

Deepak Chembakassery Rajendran

I have modified my function as follows:

kernel void foobar (
    const __global half* input,
    int num_dims_in,
    const __global int* dims_in
    const __global int* stride_in,
    float     offset_x,
    __global half* output,
    int     num_dims_out,
    const __global int* dims_out,
    const __global int* stride_out
)
{
  int w = dims_in[0];
  int h = dims_in[1];
  int d = dims_in[2];
  int x = get_global_id(0);
  int y = get_global_id(1);
  int z = get_global_id(2);
  int index = x + y * w + z * w * h;
  vstore_half((( vload_half(index, input) > 0) ? vload_half(index, input): vload_half(index, input) * offset_x), index, output);
  return ;
}

This is working fine, I am getting proper output.

Leave a Comment

Please sign in to add a comment. Not a member? Join today