I just started to use IACA and it looks to me that the resulted values differ from the documented ones. For example, when I execute it on a single vmulpd operation, like IACA_START vmulpd r1, r2, r3 IACA_END it reports 4 cycles for both data dependency and performance latency, whereas the latency of vmulpd is 5 (Table C-2 in the Architecture Optimization Manual). Checking the vaddpd seems to result in the correct value of 3 cycles. How should I interpret the reported 4 cycles latency of the vmul instruction?
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.