Performance benefits of vector load/store functions

Performance benefits of vector load/store functions

What are the performance benefits of using vload4 instead of loading data one by one if the buffers are not aligned on a float4 boundary? Onthe other hand, if the buffers are aligned on a float4 boundary, will there be a performance penalty in using vload4 instead of using *float4Ptr?

Thanks in advance

publicaciones de 4 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

According to the spec the behavior is undefined if the data you are trying to load using vloadn is not correctly aligned (vloadn functions take two arguments - a start address and an offset, so start+offset*n should be aligned).

For the second part of your question,if your buffers are aligned (and for float4 the requirement is that it is aligned appropriately) there should be no difference is performance.

Thanks,
Raghu

As per the spec, the start address of vloadn of float data type must be 4 byte aligned and not required to be 16 bytes aligned. Please correct me, if I am wrong. I would like to know the performance benefit of using vloadn in such a scenario when the buffer address is aligned on a float boundary and not float4 boundary.
Thanks.

Sorry I misread your original post.

Yes vloadn requires the data (address+offset*n) to be aligned to sizeof(gentype). If the data is already aligned to 16bytes I don't think there is any performance difference in either approach. If the data is only aligned to float boundary you have to use vload4 since float4 data types require 16byte alignment.

Thanks,
Raghu

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya