when I try to execute the following code on my intel penryn ULV 1.4 core2duo, which consists of fn1() and fn2():
fn1() is visibly slower than fn2() - upon inspection of .s assembly code resulting from gcc -S I noticed that fn1() basically loops a decl instruction ~64 times and fn2() does seem to consist of ~23 instructions including 2 mul iinstructions which need to be repeated 10 times in this example. Despite this fn1() has ~3 times slower execution. (Compilation without -O otherwise gcc applies optimizations that alter the nature of fn1())
Would someone be so kind and elaborate what the cause is for fn1() slower execution?