Crash when assigning two structures.

Crash when assigning two structures.

Romain c.'s picture

Hi !

I am having quite a strange problem on the cpu ( I don't have intel gpu to try this, but works on nvidia ).

I have a kernel which makes everything crash at an assignment depending on my structure.

If I uncomment the _pad0 variable the crash does not appear. The sizes are the same on the host and device.

Here is my structure:

typedef struct capsule_s

	{

	    vector2d_t p; // center of mass position

	    vector2d_t v; // center of mass velocity

	    scalar_t    r; // Radius

	    scalar_t    mass_density;

	    scalar_t    color_field;

	    //scalar_t _pad0; // For removing the crash with intel cpu.

	} capsule_t;

vector2d_t is defined as cl_float2 for host and float2 in kernels. scalar_t as cl_float and float.

In the kernel I have something similar to:

capsule_t geom_old = in_capsules[idx];

	/*

	Code that doesn't crash.

	*/
out_capsules[idx] = geom_new;  // Here it crashes

More precisely I found that all of this fail :

out_capsules[idx].v = geom_new.v;  // fails
/----------
float2 tmp =  geom_new.v; // no failure here.
out_capsules[idx].v = tmp; // fails
/----------
out_capsules[idx].v.x = geom_new.v.x;  // fails

And this only happens for this member. I am not sure if this is related to alignement since other members assignments do not cause problems.

Any help is welcome :)

Thanks in advance !

 

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Raghu Muthyalampalli (Intel)'s picture

Is it possible to attach the minimal kernel that results in the crash?

Thanks,
Raghu

Romain c.'s picture

Of course ! Thx for the reply :)

Here is the whole kernel if you want the including header and other functions I can send if you want.

__kernel void bacteria_step ( __global capsule_t* in_capsules, 

	                                          __global capsule_t* out_capsules,

	                                          __global int* box_map,

	                                          __global gp_box_t* boxes,

	                                          __global gp_command_t* commands,

	                                           float timestep,

	                                           float boundary_sphere_radius )

	{

	    int idx = get_global_id(0);
    int box_idx = box_map[idx];

	    gp_box_t box = boxes[box_idx];

	    capsule_t geom_old = in_capsules[idx];

	    

	    vector2d_t pressure_force  = (vector2d_t) (.0f, .0f );

	    vector2d_t viscosity_force = (vector2d_t) (.0f, .0f );

	    vector2d_t surface_normal  = (vector2d_t) (.0f, .0f );

	    scalar_t   lagr_ci         = .0f;

	    for(int ii = 0; ii < GP_BOX_POP_SIZE + GP_BOX_GHOST_SIZE; ii++)

	    {

	        int i = (ii < GP_BOX_POP_SIZE) ? box.gp_box_pop[ii] : box.gp_box_ghost[ii-GP_BOX_POP_SIZE];

	        if ( (i != GP_UNDEF_ID) )

	        {

	            capsule_t geom_alt = in_capsules[i];

	            if ( i != idx )

	            {   

	                pressure_force  -= pressure_between_two ( geom_old, geom_alt );

	                viscosity_force += viscosity_between_two ( geom_old, geom_alt );

	            }
            surface_normal +=  ( MASS / geom_alt.mass_density ) * poly6_kernel_nabla ( geom_old.p - geom_alt.p, SMOOTHING_LEN );

	            lagr_ci += ( MASS / geom_alt.mass_density ) * poly6_kernel_nabla_2 ( geom_old.p - geom_alt.p, SMOOTHING_LEN );

	        }

	    }
    viscosity_force *= VISCOSITY;

	    vector2d_t surface_force = -SURF_TENSION * lagr_ci * ( surface_normal / length ( surface_normal ) );

	    capsule_t geom_new = geom_old;

	    

	    /////////////

	    // Leap-frog

	    /////////////

	    

	    vector2d_t tot_force = pressure_force +  REST_DENSITY * GRAVITY + viscosity_force;

	    tot_force += length ( surface_normal ) >= THRESHOLD ? surface_force : .0f;

	    vector2d_t accel     = tot_force / geom_new.mass_density;
    vector2d_t old_vel   = geom_new.v;

	    vector2d_t new_vel   = geom_new.v + accel * timestep; // vel_.5dt = vel_-.5dt + dt*accel.
    geom_new.p += timestep * new_vel;

	    geom_new.v = ( old_vel + new_vel ) * .5f;
   

	    handle_sphere_env_collision ( &geom_new,

	                                  boundary_sphere_radius, 

	                                  timestep );
    out_capsules[idx] = geom_new;

	    /////////////

	    // Update map

	    /////////////

	    

	    

	    int2 box_coord = (int2){ floor(geom_new.p.x / GP_BOX_URADIUS),

	                             floor(geom_new.p.y / GP_BOX_URADIUS) };
    if ( (box_coord.x != box.gp_box_coord.x) || (box_coord.y != box.gp_box_coord.y) )

	    {

	        commands[idx].gp_command_type = GP_COMMAND_MOVE;

	        commands[idx].gp_command_pos = box_coord;

	        commands[idx].gp_command_data = idx;

	    }

	    else

	        commands[idx].gp_command_type = GP_COMMAND_NONE;

	        

	}

Raghu Muthyalampalli (Intel)'s picture

Quote:

Romain c. wrote:

Here is the whole kernel if you want the including header and other functions I can send if you want.

Yes please.

Romain c.'s picture

I have put them in a zip file.

Also I forgot to say if I remove the color_field (and padding) member it works too.

Attachments: 

AttachmentSize
Download kernels_and_headers.zip8.32 KB
Rami Jiossy (Intel)'s picture

I would suspect that the struct layout (size in bytes) on the host compiler is different from the opencl CPU compiler. Can u please try using 'packed' keyword for the struct definition on Host and Kernel as well, see if that helps. (OpenCL 1.2 specification 6.11.1)

Romain c.'s picture

I tried with packed for kernel side, and used the pragmas  push, pack and pop ( not sure about this ) on host side, it makes the program not crash but size of the structure is 28 on kernel but still 32 on host. So data is not correct after...

Thanks for helping :)

Raghu Muthyalampalli (Intel)'s picture

Can you post the host side structure (with the pragmas you added)?

Raghu

Romain c.'s picture

Here it is. Names are a bit different but everything is as before. Thx.

#ifndef __SPH_KERNEL__

	#pragma pack(push)  /* push current alignment to stack */

	#pragma pack(1)     /* set alignment to 1 byte boundary */

	#endif

	typedef struct 

	#ifdef __SPH_KERNEL__

	    __attribute__ ((packed)) 

	#endif

	    capsule_s

	{

	    vector2d_t p; // center of mass position

	    vector2d_t v; // center of mass velocity

	    scalar_t   r; // Radius

	    scalar_t   mass_density;

	    scalar_t   rr;

	} capsule_t;

	#ifndef __SPH_KERNEL__

	#pragma pack(pop) 

	#endif

Raghu Muthyalampalli (Intel)'s picture

With or without packing the size of the above structure will be 28. Are you sure you are not picking up the definition of the structure from some other header?

Try writing a simple test program, declare capsule_s and try to print the size of the structure. What do you get?

Thanks,
Raghu

Romain c.'s picture

I tried what you suggested with a c++ and c compiler only using the struct declaration with cl_types. On c++ compiler the size is indeed 28 but on a C99 compiler it is 32 ( both clang and intel one ).

Moreover in c99 I think that the compiler add padding to be aligned with the largest data in the struct. For instance here, largest size is 8, and 32 is multiple of 8. Am I right?

Thanks for your patience :)

Login to leave a comment.