Weird BTS Performance

Weird BTS Performance


I am playing around with the BTS feature and observe a very huge performance penalty. It is said in several places that this is normal. On the other hand I see academic publications that use this feature which report only a very small overhead.

Therefore, I have performed several experiments on different CPUs with different DebugCtl settings and different kind of memory caching types.

What me confuses the most is the fact that experiments with only the TR-flag enabled are *much* slower than those with TR-flag *and* BTS-flag enabled. From my understanding enabling TR+BTS does "more" than only TR, in fact writing the BTM not only to the system bus but also to the DebugStore.

Am I wrong? What is the reason for this "strange" observation?

Thanks a lot for your help,

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Branch Trace isdesigned to help tools to profile/diagnose. It can capture a lot of information, and the associated costs (delays) goes with the amount/freqency your tool ask the HW to capture. In-frequent sampling would incur smaller overhead. Doing it frequently would be like attaching a exhaust emission analyzer to the car's tail pipe, it won't be able to drive normally or normal gas mileage.

Thank you for your answer.
Unfortunately, I cannot see how it is related to my particular question.


Perhaps you didn't tell us how your question relates to this forum.

Hi Tim,

I am sorry if I have used the wrong one. I just have searched for similar requests and found some in this forum. Is their a better suited for BTM/PerformanceMonitoring-related questions?

Cheers & Thanks

Can you please post your question to this forum:

Leave a Comment

Please sign in to add a comment. Not a member? Join today