Problems running scan SDK examples from NVidia

Problems running scan SDK examples from NVidia

HiI have tried running NVidia's OpenCL example of scan (prefixsum) on an Intel CPU but without luck.The program fails with a memory error.If I remove the barrier synchronization lines in the following part of the code it runs without errors, but the results are offcause wrong:

inline uint scan1Inclusive(uint idata, __local uint *l_Data, uint size){
	uint pos = 2 * get_local_id(0) - (get_local_id(0) & (size - 1));
    l_Data[pos] = 0;
    pos += size;
    l_Data[pos] = idata;

	for(uint offset = 1; offset < size; offset <<= 1){
		barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL
        uint t = l_Data[pos] + l_Data[pos - offset];
		barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL
		l_Data[pos] = t;
	}
	return l_Data[pos];
}

I have tested other code using barriers that worked fine. The main difference is that this example is using a three layer nesting of inlining. The inserted code is the innermost function.The code runs fine on a NVidia graphics card and on CPU compiled with the AMD openCL compiler.-Jens

publicaciones de 3 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Have just tried to inline the code manually which made it work!

Hello jrimestad,

Thanks for the report. We're working on reproducing and fixing the problem.

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya