Problems running scan SDK examples from NVidia

HiI have tried running NVidia's OpenCL example of scan (prefixsum) on an Intel CPU but without luck.The program fails with a memory error.If I remove the barrier synchronization lines in the following part of the code it runs without errors, but the results are offcause wrong:

inline uint scan1Inclusive(uint idata, __local uint *l_Data, uint size){
	uint pos = 2 * get_local_id(0) - (get_local_id(0) & (size - 1));
    l_Data[pos] = 0;
    pos += size;
    l_Data[pos] = idata;

	for(uint offset = 1; offset < size; offset <<= 1){
		barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL
        uint t = l_Data[pos] + l_Data[pos - offset];
		barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL
		l_Data[pos] = t;
	return l_Data[pos];

I have tested other code using barriers that worked fine. The main difference is that this example is using a three layer nesting of inlining. The inserted code is the innermost function.The code runs fine on a NVidia graphics card and on CPU compiled with the AMD openCL compiler.-Jens

Have just tried to inline the code manually which made it work!

Hello jrimestad,

Thanks for the report. We're working on reproducing and fixing the problem.

