I'm currently optimizing some algorithms in assembler using software prefetching.
Now I'd like to measure the changes.
I used the performance counters below on my Xeon 5130 with Intel Core Architecture. But while the execution time decreases after optimization the l1 and l2 cache misses seem to increase.
Performance counters I used:
Use Eventnum Umask
L1 Requests 0x40 0x0F
L2 Requests 0x2E 0xFF
L1 Misses 0xCB 0x02
L2 Misses 0x24 0x01
Are these the right performance counters?
Thanks in advance,