In the following profile, the two columns are CPU_CLK_UNHALTED.THREAD and INST_RETIRED.ANY. There are no branches in the block, so for each execution of it, all instructions are retired exactly once. So why does the INST_RETIRED.ANY column vary so widely?
4053 4 384 0 0 -1 per_thread_sort(void*)+0x285shl $0x4, %r10 101 260
4057 3 386 0 0 -1 movq (%rbx), %rax 0 2
4060 3 386 0 0 -1 xor %r8d, %r8d 1 3
4063 4 384 0 0 -1 lea (%rbx,%r10,1), %r13 0 0
4067 3 386 0 0 -1 mov %rbx, %rdi 126 285
4070 5 386 0 0 -1 mov $0x2, %edx 0 0
4075 4 386 0 0 -1 lea -0x10(%r13), %r15 0 0
4079 6 386 0 0 -1 movlpdq -0x10(%r13), %xmm1 0 1
4085 4 386 0 0 -1 movq %rax, -0x10(%r13) 290 367
4089 4 386 0 0 -1 movq 0x8(%rbx), %rax 19 0
4093 4 386 0 0 -1 movl -0x4(%r13), %r9d 19945 3254
4097 3 386 0 0 -1 mov %r15, %rcx 65 42
4100 4 386 0 0 -1 movl -0x8(%r13), %r10d 8 12
4104 3 386 0 0 -1 sub %rbx, %rcx 0 0
4107 4 386 0 0 -1 sar $0x4, %rcx 74 167
4111 4 386 0 0 -1 movq %rax, -0x8(%r13) 22 19
4115 4 386 0 0 -1 cmp $0x2, %rcx 5 16
4119 2 386 0 0 4205 jle 0x40106d
0 0


