Why does INSTR_RETIRED.ANY vary so much within a basic block

Why does INSTR_RETIRED.ANY vary so much within a basic block

imagem de martincmartin

In the following profile, the two columns are CPU_CLK_UNHALTED.THREAD and INST_RETIRED.ANY. There are no branches in the block, so for each execution of it, all instructions are retired exactly once. So why does the INST_RETIRED.ANY column vary so widely?

4053 4 384 0 0 -1 per_thread_sort(void*)+0x285shl $0x4, %r10 101 260
4057 3 386 0 0 -1 movq (%rbx), %rax 0 2
4060 3 386 0 0 -1 xor %r8d, %r8d 1 3
4063 4 384 0 0 -1 lea (%rbx,%r10,1), %r13 0 0
4067 3 386 0 0 -1 mov %rbx, %rdi 126 285
4070 5 386 0 0 -1 mov $0x2, %edx 0 0
4075 4 386 0 0 -1 lea -0x10(%r13), %r15 0 0
4079 6 386 0 0 -1 movlpdq -0x10(%r13), %xmm1 0 1
4085 4 386 0 0 -1 movq %rax, -0x10(%r13) 290 367
4089 4 386 0 0 -1 movq 0x8(%rbx), %rax 19 0
4093 4 386 0 0 -1 movl -0x4(%r13), %r9d 19945 3254
4097 3 386 0 0 -1 mov %r15, %rcx 65 42
4100 4 386 0 0 -1 movl -0x8(%r13), %r10d 8 12
4104 3 386 0 0 -1 sub %rbx, %rcx 0 0
4107 4 386 0 0 -1 sar $0x4, %rcx 74 167
4111 4 386 0 0 -1 movq %rax, -0x8(%r13) 22 19
4115 4 386 0 0 -1 cmp $0x2, %rcx 5 16
4119 2 386 0 0 4205 jle 0x40106d
0 0

2 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de julia-fedorova (Intel)

non-uniform distribution of INST_RETIRED.ANY is due to combinations of several effects in out-of order pipeline known as
1. skid
2. aggregation
3. shadowing

1. skid is when an instruction caused counter overflow is not reported but instead reported the instruction that few cycles later

2. aggregation: long latency instructions those that access memory or causing some issues (e.g. some stalls) sit on the head of the instruction queue for a longer time so have higher probability to be sampled

3. shadowing: instructions that are after those aggregating instructions are in the shadow of the first ones and have lower probability to be sampled

there could be other effects - e.g. synchronization of a sample period with the application code that could also lead to oversampling some instructions.

so the picture you see is quite usual.

hope this helps.
j

Faça login para deixar um comentário.