I am playing around with the BTS feature and observe a very huge performance penalty. It is said in several places that this is normal. On the other hand I see academic publications that use this feature which report only a very small overhead.
Therefore, I have performed several experiments on different CPUs with different DebugCtl settings and different kind of memory caching types.
What me confuses the most is the fact that experiments with only the TR-flag enabled are *much* slower than those with TR-flag *and* BTS-flag enabled. From my understanding enabling TR+BTS does "more" than only TR, in fact writing the BTM not only to the system bus but also to the DebugStore.
Am I wrong? What is the reason for this "strange" observation?
Thanks a lot for your help,