I have a kernel that processes RGB images. Currently, I take each channel one by one, and run the same kernel on that channel
The kernel input is a global memory buffer: data is moved in chunks from the global buffer into local memory for processing, then stored into another global buffer as output.
I was thinking of refactoring this to store all three channels in an RGBA buffer, and operate on all three channels at the same time, using vector operations. I understand that images have better spatial caching.
Is there any disadvantage to this refactor? I realize that I will have to reduce the number of pixels per chunk, because I will now have three times the amount of data.