I am analying a code for Image processing and I find that the bottleneck is slow LEA instructions. can someone help me wth menthods to fix it??
If LEA instructions with three operands (base, index, offset), there is pressure of using port 1 and port 3 to cause 3 cycles latency - especially in deep loop. I don't think it makes sense to modify (inline?) assembly code directly, recommend to use Intel(r) C/C++ compiler with advanced options, such as O2, xHost, etc.
In source code level, you may review:
1. Reduce index access in loop, if possible
2. Consider data alignment
3. Reduce branch code in loop
4. No dependency between iterations of loop
5. Others I missed
Thanks for your response. I will take a look into your recomendations and see how I can optimize my code.