I need some suggestions and have a few questions on performance monitoring on Core 2 Duo 6400.
I'm trying to collect source and target addresses of every mis-predicted indirect branches (including return) with minimal performance overhead. I've used PEBS mechanism (event number C5) to record information and read op-code to find out indirect branches. However PEBS records target of mis-predicted branch instruction. Hence I've tried other ways to collect both source and target information:
1. BTS with USR only: Since Core2 Duo does not have branch status bit in BTS records format I used BTS with PEBS (event number 0xC5) . It incurs 5x slow down for speccpu2006 integer benchmarks on average.
2. Performance Counter with LBR stack: I selected event 0x8E(BR_IND_MISSP_EXEC) and 0x90(BR_RET_MISSP_EXEC) and set counter value to 0xffffffffff to generate PMI for every event. In PMI service routine, I used 3 rdmsr (to read out LBR stack) and 2 wrmsrs( to reset counter and re-enable LBR stack). It takes 1900 cycles per PMI on average in linux- 2.6.27 . It slows down 3x~ 5x for speccpu2006 integer benchmarks.
Here are my questions:
Is there any way to record source instruction on PEBS records with event (0xC5)?
Is there any other way rather than what I've tried to collect source and target addresses of retired mis-predicted indirect branch efficiently?
Does 0x80E (BR_IND_MISSP_EXEC)event include 0x94(BR_IND_CALL_EXEC)?
What is the difference between 0x90(BR_RET_MISSP_EXEC) and 0x91(BR_RET_BAC_MISSP_EXEC)? How does Core 2 Duo predict target of returns?
Thanks in advance