Hello,
for data fetching there always are load and loadu intrinsics. load only accepts aligned addresses and loadu will work in both cases.
But what about performance? Latency and Throughput of both instructions is the same according to Intel Intrinsics Guide. What will happen if loadu is executed on aligned addresses? Do I get the same performance compared to load? Or is loadu slower regardless of the real alignment of the given address.
Thanky for any hints!




