Counting algorithm

Counting algorithm

Hi,

I have the following situations :

a) an array of 'int' where each int contains a 'flag' (a value defined by a bit)

b) the set is sorted by 'flag'

I use the following kernel to count the number of 'type of flag'. To do this I simply detect where the flag differ and put this value back to a structure.

Sometimes it works, sometimes not !

typedefstruct
{
uintlastId;
uinttasksCount_CreateCameraRays;
uinttasksCount_Trace;
uinttasksCount_Shade;
uinttasksCount_Shade_AO;
uinttasksCount_ImageReconstruction;
}clTasksTypes;

//ALSODEFINEDIN'Types.cl'
#defineTASK_HASH_CREATE_CAMERA_RAY((uint)1<<24)
#defineTASK_HASH_TRACE((uint)2<<24)
#defineTASK_HASH_SHADE((uint)4<<24)
#defineTASK_HASH_SHADE_AO((uint)8<<24)
#defineTASK_HASH_RECONSTRUCT_IMAGE((uint)16<<24)

//#defineIS_FLAG_SET(VAL,FLAG)(VAL&FLAG)
//#defineIS_FLAG_SET(VAL,FLAG)((VAL&FLAG)==FLAG)
#defineIS_FLAG_SET(VAL,FLAG)((VAL&FLAG)!=0)

__kernel
voidkernel__countTasksTypes(
__globaluint*indices,
__globalclTasksTypes*tasksTypes,
uinttasksCount
)
{
size_tgid=get_global_id(0);

if(gid<1)
{
tasksTypes->tasksCount_CreateCameraRays=0;
tasksTypes->tasksCount_Trace=0;
tasksTypes->tasksCount_Shade=0;
tasksTypes->tasksCount_Shade_AO=0;
tasksTypes->tasksCount_ImageReconstruction=0;
barrier(CLK_GLOBAL_MEM_FENCE);
return;
}

barrier(CLK_GLOBAL_MEM_FENCE);

if(gid>=tasksCount)
return;

//Savethelasthash/keyvalue
if(gid==tasksCount-1)
tasksTypes->lastId=indices[(tasksCount-1)*2];

constuinthashKey1=indices[(gid*2)-2];
constuinthashKey2=indices[gid*2];
if(hashKey1!=hashKey2)
{
if(IS_FLAG_SET(hashKey1,TASK_HASH_CREATE_CAMERA_RAY))
tasksTypes->tasksCount_CreateCameraRays=gid;
elseif(IS_FLAG_SET(hashKey1,TASK_HASH_TRACE))
tasksTypes->tasksCount_Trace=gid;
elseif(IS_FLAG_SET(hashKey1,TASK_HASH_SHADE))
tasksTypes->tasksCount_Shade=gid;
elseif(IS_FLAG_SET(hashKey1,TASK_HASH_SHADE_AO))
tasksTypes->tasksCount_Shade_AO=gid;
elseif(IS_FLAG_SET(hashKey1,TASK_HASH_RECONSTRUCT_IMAGE))
tasksTypes->tasksCount_ImageReconstruction=gid;
}
}

There is only one special case, handled on the host side, it is for the last type of flag.

constunsignedinttasksCount=_tasksBatchSize;
_kernel__CountTasksTypes->setArg(0,*_clBuffer_TasksIndices);
_kernel__CountTasksTypes->setArg(1,*_clBuffer_CountTasksTypes);
_kernel__CountTasksTypes->setArg(2,tasksCount);

_queue->enqueueNDRangeKernel(*_kernel__CountTasksTypes,cl::NullRange,cl::NDRange(globalWork),cl::NDRange(_workGroupSize_kernel__CountTasksTypes),0,0);

_queue->enqueueReadBuffer(*_clBuffer_CountTasksTypes,CL_TRUE,0,sizeof(clTasksTypes),&_cpuBuffer_TasksTypes);

//Computethelastcount
unsignedinttotalCount=_cpuBuffer_TasksTypes.tasksCount_CreateCameraRays+_cpuBuffer_TasksTypes.tasksCount_Trace+_cpuBuffer_TasksTypes.tasksCount_Shade+_cpuBuffer_TasksTypes.tasksCount_Shade_AO+_cpuBuffer_TasksTypes.tasksCount_ImageReconstruction;
if(_cpuBuffer_TasksTypes.lastId&TASK_HASH_CREATE_CAMERA_RAY)
_cpuBuffer_TasksTypes.tasksCount_CreateCameraRays=_tasksBatchSize-totalCount;
elseif(_cpuBuffer_TasksTypes.lastId&TASK_HASH_TRACE)
_cpuBuffer_TasksTypes.tasksCount_Trace=_tasksBatchSize-totalCount;
elseif(_cpuBuffer_TasksTypes.lastId&TASK_HASH_SHADE)
_cpuBuffer_TasksTypes.tasksCount_Shade=_tasksBatchSize-totalCount;
elseif(_cpuBuffer_TasksTypes.lastId&TASK_HASH_SHADE_AO)
_cpuBuffer_TasksTypes.tasksCount_Shade_AO=_tasksBatchSize-totalCount;
elseif(_cpuBuffer_TasksTypes.lastId&TASK_HASH_RECONSTRUCT_IMAGE)
_cpuBuffer_TasksTypes.tasksCount_ImageReconstruction=_tasksBatchSize-totalCount;

I can't find why it does not work, it should be simple no ?

Any suggestion is welcomed, thanks

http://spectralbattle.wordpress.com/
6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hey,

I don't know whether it relates to your problem, but your code is illegal. OpenCL requires that if any work item in a work group encounters a barrier instruction, all work items in that workgroup encounter the same barrier instruction. The first if statement (gid < 1) violates this assuming your work-group size is larger than one - other work items will still encounter "a" barrier, but not the same one.

Doron Singer

Thanks,

no it is not the problem :P

About the barrier, it is the case because, either gid <1 and I have a barrier and return, either I have a barrier. So, finally all the workitems have a barrier.

Krys

http://spectralbattle.wordpress.com/

My understanding of the spec is that it's not enough that all work items encounter "a barrier", but rather they must all encounter "the barrier", as in the exact same instruction on the exact same line. Please re-visit the spec and let me know if you disagree.

Honnestly,I don't know... I have read the opencl spec. and it is not really clear ! If it is the case, I'm not sure it is doable on some GPU architecture...Anyway it limit the way we build our algorithms ! I think that it is something that MUST be defined in the specification... they should update the spec to gives more details ! Maybe you have some contact with peoples at the Khronos group ?Thx

http://spectralbattle.wordpress.com/

The Khronos group have message boards on their webpage. This looks like a good place to ask:
http://www.khronos.org/message_boards/viewforum.php?f=28

Leave a Comment

Please sign in to add a comment. Not a member? Join today