vload4 vs 4 individual memory accesses : bank conflicts

vload4 vs 4 individual memory accesses : bank conflicts

What is the advantage of vload4 over 4 single memory accesses?

Suppose I am loading memory from local memory. Below are two kernels. The second kernel should exhibit no bank conflict.

Does the first have bank conflicts? Because, if one vload is executed per clock, then there should be conflicts in a half wave.

 

void kernel1() {

     int start = get_global_id(0)*4;

     int4 test = vload4(start,localBuffer);

}

 

 

void kernel2() {

     int4 test;

     int start = get_global_id(0)*4;

     test.x = localBuffer[start];

     test.y = localBuffer[start+1];

     test.z = localBuffer[start+2];

     test.w = localBuffer[start+3];

  }

5 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Sorry for the delayed reply.  If you have not seen already, the optimization guide has some good tips in its "Memory Access Considerations" section.  In general, loading vector types is always more efficient than single values.  Using vector types for memory transfers makes it easier for the compiler to create efficient code.  If the compiler recognizes the pattern in kernel2 it might combine loads into a pattern close enough to kernel1 that the performance will be practically identical, but kernel2 isn't expected to have any advantages over kernel1.  

 

Thanks, Jeffrey. So, vload may exhibit fewer bank conflicts than single loads? Is there ever a situation

when vload performs worse than individual loads?

 

 

As I've asked around, what I've understood is that vload is always better.  Single loads may be combined by the compiler so in many cases there may not be a big difference, but vload is expected to be the best case.  There may be a few corner cases with odd borders where single loads may have a minor advantage but I suspect these are rare. In the future I'm hoping we will be able to do more memory transfer optimization tutorials including deeper analysis so the guidelines do not need to be so high level. Please watch for more documentation/example improvements as we can prioritize them in the future.

 

 

Hi Jeffrey,

Thanks for pursuing this. I am particularly interested in avoiding bank conflicts. Is there some way

of confirming that vload does not trigger more bank conflicts than separate loads? 

Thanks so much,

Aaron

Faça login para deixar um comentário.