Problems running scan SDK examples from NVidia

Problems running scan SDK examples from NVidia

HiI have tried running NVidia's OpenCL example of scan (prefixsum) on an Intel CPU but without luck.The program fails with a memory error.If I remove the barrier synchronization lines in the following part of the code it runs without errors, but the results are offcause wrong:

inline uint scan1Inclusive(uint idata, __local uint *l_Data, uint size){
	uint pos = 2 * get_local_id(0) - (get_local_id(0) & (size - 1));
    l_Data[pos] = 0;
    pos += size;
    l_Data[pos] = idata;

	for(uint offset = 1; offset < size; offset <<= 1){
		barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL
        uint t = l_Data[pos] + l_Data[pos - offset];
		barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL
		l_Data[pos] = t;
	return l_Data[pos];

I have tested other code using barriers that worked fine. The main difference is that this example is using a three layer nesting of inlining. The inserted code is the innermost function.The code runs fine on a NVidia graphics card and on CPU compiled with the AMD openCL compiler.-Jens

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Have just tried to inline the code manually which made it work!

Hello jrimestad,

Thanks for the report. We're working on reproducing and fixing the problem.

Leave a Comment

Please sign in to add a comment. Not a member? Join today