I used VTune 3.0 to sample the spin lock activitesinvoked bythe e1000 Gigabit driver and the Linux kernel 2.6.12.I found the CPI of _spin_lock is almost 27while _spin_lock has 100%L2 cache hit rate.
I checked the assembly code of _spin_lock in Linux and it uses the LOCK instruction.Based on IA32 optimization manual,the LOCK prefix does not lock the FSB once the referred data is found in the L2 cache of local CPU. However, it also goes to say that, Locked instructions are inherently slow, whether the data to be locked in found in the L2 cache or not.
I still do not understand what caused the CPI of _spin_lock so high?
Thanks a lot,
_spin_lock code in Linux
1: lock; decb slp# atomically decrement
jns 3f # if clear sign bit jump forward to 3
2: cmpb $0,slp # spin compare to 0
pause # spin wait
jle 2b # spin go back to 2 if <= 0 (locked)
jmp 1b # unlocked; go back to 1 to try to lock again
3: # we have acquired the lock
Message Edited by firstname.lastname@example.org on 07-11-2005 03:09 PM