64k aliasing conflicts performance impact

64k aliasing conflicts performance impact

vtuneuser's picture

The vtune doc stated that the 64k aliasing conficts event is nota precise event, and they were counted more than once per conflict. So what value of this impact needs attention? I had the following assembly code corresponding one line source code which was a macro definition in C,

label+37d: mov ebp DWORD PTR[ebp]
mov edx ebp
sub edx eax
test edx edx
jnge label+3ac
label+388: mov edx DWORD PTR[esp+01ch]
lea edx DWORD PTR[edx-1]
cmp edx -0x1
je label+3d2
label+394: movzx ecx BYTE PTR[ebp]
mov BYTE PTR[eax], cl 72561
add edx ,-0x1
add ebp, 0x1h
add eax,0x1h

Vtune pointed out "mov BYTE PTR[eax],cl" had the longest timer, how do I identify where 64k aliasing conflict occurred?

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
MrAnderson (Intel)'s picture


So, there are a few things to consider here:


First, because it is not precise, what we call event skid occurs. Thus, even though the events are listed for the instruction you specify, the event was probably generated due to a preceding instruction. How many instructions preceding this one? We can't know exactly. But, my experience has been that it is the immediately preceding instructions. Now, you have to use your noodle (i.e. your head) and if the preceding instruction couldn't have generated that event (e.g., a jmp doesn't necessarily generate 64k aliasing conflicts, unless it were indirect or something like that) you back up another instruction.


Second, while the processor manual states that the event can be counted more than once, this is due to speculative execution and does not necessarily mean that in your case it was counted more than once. It would depend on if your code is working well with the hardware prefetch algorithm of the processor or not.


Finally, 64k Aliasing Conflicts is one of those events that is frustrating, at best! While it identifies that the processor is not executing to its potential because of something the code is doing, it doesn't tell you where the problem is! :-( Basically, you need to try to deduce which pointers are conflicting by examining the algorithm and ensuring that any allocated memory is offset by 64 bytes (also, there is mention of modifying stack allocation - see the VTune analyzer online help under "Insights and Advice" for the processor events).


By the way, excellent question!


Hope this helps,


Message Edited by DaveA on 10-04-2005 11:24 AM

Regards, MrAnderson
Tim Prince's picture

Recent processor steppings have eliminated the 64K aliasing. The same event in VTune, if it actually occurs, is triggered by 4M aliasing. So, you would look for data which would map to the same cache line, mod 4MB.

vtuneuser's picture

Thanks for reply. The system I used is Family 15 Model 2 Stepping ID 5, should it be 64K or 4M?

vtuneuser's picture

Thanks for reply. Is it true that Intel compiler has the options that allow padding of structures to achieve better alignment for structures? or should I ask the question to the compiler forum?

MrAnderson (Intel)'s picture

Yes, the Intel C/C++ compiler supports padding of structures.


Regarding 64K vs 4MB Aliasing, the VTune analyzer will identify your processor and correctly present the event in the Events dialog, so you don't need to worry about which one it is. That is, only one will appear in the list of events for your processor.

Regards, MrAnderson

Login to leave a comment.