Branch Trace Store

Branch Trace Store

Hi,

I need some help in enabling BTS.
My CPU is Core 2 Duo E8400, OS - Windows XP SP3 32bit.
I've reread manual and rechecked everything over 9000 times. Everything seems to be correct.
Thats what I checked:

CPUID.1:EDX[21] = 1
IA32_MISC_ENABLE[7] (Performance Monitoring Available) = 1: Performance monitoring enabled
IA32_MISC_ENABLE[11] (Branch Trace Storage Unavailable) = 0: BTS is supported
IA32_APIC_BASE[11] (APIC global enable/disable flag) = 1: APIC enabled.
Spurious Interrupt Vector Register, bit 8 = 1: APIC Enabled.
Error Status Register = 0: there are no APIC errors.

DS area created, IA32_DS_AREA MSR, IA32_DEBUGCTL MSR and LVT Performance Counter Register are set, PMI handler in the IDT is established. APIC registers' base is also checked. For DS area I tried to use reserved in driver image memory and memory allocated with ExAllocatePool().

On enabling BTS the performance (of the core at which it was enabled) slows down, but there are no any records in BTS buffer.
Here is some code in fasm (simplified parts).
DS structure:

BTS_entries_num = 330

reserved_BTS_entries_num = 80
struct Branch_Record
Branch_From dd ?

Branch_To dd ?

Branch_Predicted dd ?
ends
struct DS

;-------BTS buffer base

BTS_buffer_base dd DS.BTS_buffer
;-------BTS index

BTS_index dd DS.BTS_buffer
;-------BTS absolute maximum

BTS_max dd BTS_entries_num *12 +DS.BTS_buffer
;-------BTS interrupt threshold

BTS_int dd (BTS_entries_num - reserved_BTS_entries_num) *12 +DS.BTS_buffer
;-------PEBS save area

dd 6 dup ?

ld dd ?
av = 128 - ((DS.ld+4) mod 128)	;bytes to align

db av dup ?			;align 128
BTS_buffer Branch_Record.dup BTS_entries_num	;BTS_entries_num of Branch_Record struct

ends
DS initialization:
;----allocating memory

push sizeof.DS

push NonPagedPoolCacheAligned

call [ExAllocatePool]

mov [DS_addr], eax
;----DS initialization

lea ebx, [eax+DS.BTS_buffer]

mov [eax+DS.BTS_buffer_base], ebx

mov [eax+DS.BTS_index], ebx
add ebx, BTS_entries_num *sizeof.Branch_Record

mov [eax+DS.BTS_max], ebx
sub ebx, reserved_BTS_entries_num *sizeof.Branch_Record

mov [eax+DS.BTS_int], ebx
;----clearing memory

mov edi, [DS_addr]

add edi, DS.BTS_buffer

mov ecx, BTS_entries_num*sizeof.Branch_Record

xor eax, eax

cld

rep stosb
Setting LVT and IDT:
vec_num = 24h
;		                             fixed       edge sensitive   not masked

mov dword [0FFFE0340h], vec_num or (000b shl 8) or (0b shl 15) or (0b shl 16)
push esi

sidt [esp-2]

pop esi

add esi, vec_num*8
mov eax, IntHandler
mov word [esi], ax

bswap eax

xchg al, ah

mov word [esi+6], ax

mov ax, cs

mov word [esi+2], ax

mov byte [esi+4], 0

mov byte [esi+5], 10001111b
BTS enabling:
mov ecx, 600h	;IA32_DS_AREA

rdmsr

mov eax, [DS_addr]

wrmsr
mov ecx, 01D9h	;IA32_DEBUGCTL

rdmsr

;	        TR	         BTS	     BTINT	    BTS_OFF_OS   BTS_OFF_USR

mov eax, (1 shl 6) or (1 shl 7) or (1 shl 8) or (1 shl 9) or (0 shl 10)

wrmsr

And I also have some questions (manual doesn't give CLEAR answers on them):
1. Can DS be on same page with code (if triggering self-modifying code actions doesn't worry me)?

2. (from manual) "The DS save area can be larger than a page, but the pages must be mapped to
contiguous linear addresses."

Does it mean that all 3 DS areas must be in pages that are contiguous on LINEAR space or does it mean that pages with DS must be MAPPED to contiguous PHYSICAL addresses? Because pages are mapped to physical addresses rather than linear...

3. (from manual) "In order to prevent generating an interrupt, when working with
circular BTS buffer, SW need to set BTS interrupt threshold to a value
greater than BTS absolute maximum (fields of the DS buffer
management area). It's not enough to clear the BTINT flag itself only."

In other words, BTINT doesn't control PMIs. So, what is the purpose of BTINT?

4. APIC registers can only be accessed with mov or other institutions (and, or etc) are acceptable?

P.S. Working code in any language (asm is preferred) will be useful.

Thanks,
q1nex

publicaciones de 4 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Patrick Fay (Intel)

Hello q1nex,
I'm trying to find someone who can answer your questions.
Pat

Thanks, Pat. Hope you'll find somebody.

And one more question about BTS.
Nehalem and newer CPUs have 16 pairs of LBR MSRs while Core 2 have only 4. Does it mean BTS performance with Nehalem will be almost 4 times higher?

Imagen de Patrick Fay (Intel)

Hello q1nex,
Here is the reply from our BTS guy (who was on vacation).
Note that the LBR facility is much faster than (and quite different from) the BTS facility.

The problem is that all BTS structures are 64-bit (even in the 32-bit mode) starting from Merom (family 6, model 15), so all pointers in the asm control structures should be declared as DQ instead of DD:

1.	struct Branch_Record

2.

3.	Branch_From dq ?

4.	Branch_To dq ?

5.	Branch_Predicted dq ?

6.

7.	ends

8.

9.

10.	struct DS

11.	;-------BTS buffer base

12.	BTS_buffer_base dq DS.BTS_buffer

13.

14.	;-------BTS index

15.	BTS_index dq DS.BTS_buffer

16.

17.	;-------BTS absolute maximum

18.	BTS_max dq BTS_entries_num *12 +DS.BTS_buffer

19.

20.	;-------BTS interrupt threshold

21.	BTS_int dq (BTS_entries_num - reserved_BTS_entries_num) *12 +DS.BTS_buffer

22.

23.

24.	;-------PEBS save area

25.	dd 6 dup ?

26.	ld dd ?

27.

28.	av = 128 - ((DS.ld+4) mod 128)  ;bytes to align

29.	db av dup ?         ;align 128

30.

31.

32.	BTS_buffer Branch_Record.dup BTS_entries_num    ;BTS_entries_num of Branch_Record struct

33.	ends

And to the other questions:
1.	Can DS be on same page with code (if triggering self-modifying code actions doesn't worry me)?

Never checked it, but can see no problem here.
2.	(from manual) "The DS save area can be larger than a page, but the pages must be mapped to

contiguous linear addresses."
Does it mean that all 3 DS areas must be in pages that are contiguous on LINEAR space or does it mean that pages with DS must be MAPPED to contiguous PHYSICAL addresses? Because pages are mapped to physical addresses rather than linear...

Yes, the pages should be linearly contiguous.
3.	(from manual) "In order to prevent generating an interrupt, when working with

circular BTS buffer, SW need to set BTS interrupt threshold to a value

greater than BTS absolute maximum (fields of the DS buffer

management area). It's not enough to clear the BTINT flag itself only."
In other words, BTINT doesn't control PMIs. So, what is the purpose of BTINT?

BTINT controls the generation of interrupt If its 0, no interrupt will be generated. Both BTINT and threshold control the buffer operation: the buffer becomes circular if BTINT=0 and Threshold > max_size, the buffer is non-circular and generates PMI if Threshold < max_size and BTINT = 1, and the buffer is non-circular and does not generate PMI if Threshold < max_size and BTINT = 0.
4.	APIC registers can only be accessed with mov or other institutions (and, or etc) are acceptable?

APIC registers can be accessed using any instruction, but one has to take into account various side-effects as, for instance, AND instruction will emit both load and store uOps, and mov instructions are more predictable, thats why they are recommended for use with APIC.

Inicie sesión para dejar un comentario.