Hi, I have found the perf events documented to be very helpful in previous emails, so thank you for providing information as to their behaivor. I was looking at the stats on LD behavior and the memory ordering buffer. I have some quesitons on the behavior of the hardware and what the stats measured at the link below refer to:http://redfort-software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/lin/ug_docs/reference/index.htm#pmn/events/about_front_end_performance_tuning_events.html1) the MOB is for STLF interactions, right? 2) how is the MOB used? Is it just for STLF?3) I wasn't aware there was a reservation station in SB/IV, is there, I thought all results were sent from Sched -> EX -> LD buffer? 4) does unit mask 0x7 signify all loads executed from the scheduler?4a) does unit mask 4 signify all loads performed from the MOB, i.e. they are getting there results from a previous STORE?4b) does unit maks 2 signify the result of the STORE is not in the MOB yet, but waiting a cycle allows the uop to get it from the MOB?4b*) why does 1 cycle make such a difference? what's the average STLF latency of writing the store to the MOB and then loading it back?4c) unit mask 1, does this signify the general case of loads SC->EX which have no STORE dependency and simply get their data from the L1D?Thanks for any clarifications.. looks like interesting stuff which can be performance eye opening.Perfwise
For more complete information about compiler optimizations, see our Optimization Notice.