Is there a simulator and/or ageneral procedure one can follow to predict what instructions will be executedin what order (assuming all data is in the L1 cache)? I'm having a hard time comprehending why a given instruction sequence executes much faster than another. I suspect it's due to the out of order execution and register renaming, but I've found no tangible reason yet. Any help would be appreciated.
For more complete information about compiler optimizations, see our Optimization Notice.