Problems running scan SDK examples from NVidia

Problems running scan SDK examples from NVidia

HiI have tried running NVidia's OpenCL example of scan (prefixsum) on an Intel CPU but without luck.The program fails with a memory error.If I remove the barrier synchronization lines in the following part of the code it runs without errors, but the results are offcause wrong:

inline uint scan1Inclusive(uint idata, __local uint *l_Data, uint size){
	uint pos = 2 * get_local_id(0) - (get_local_id(0) & (size - 1));
    l_Data[pos] = 0;
    pos += size;
    l_Data[pos] = idata;

	for(uint offset = 1; offset < size; offset <<= 1){
		barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL
        uint t = l_Data[pos] + l_Data[pos - offset];
		barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL
		l_Data[pos] = t;
	return l_Data[pos];

I have tested other code using barriers that worked fine. The main difference is that this example is using a three layer nesting of inlining. The inserted code is the innermost function.The code runs fine on a NVidia graphics card and on CPU compiled with the AMD openCL compiler.-Jens

3 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Have just tried to inline the code manually which made it work!

Hello jrimestad,

Thanks for the report. We're working on reproducing and fixing the problem.

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi