Intel ISA Extensions

Instructions Retired Equation


For my experiments I am trying to formulate an equation which gives the Instructions retired in a given Interval.

Like, the frequency utilization in a second f can be described as.

f= IR*CPIexe + Bus_Trans_Mem*Penalty*Miss rate + I/O Stalls+ ROB Stalls+Branch Misprediction Stalls + other stalls.

Can anybody explain if the above formulation is correct?

RDTSC() for measuring latency of an operation


I am trying to measure the latency of an operation by using rdtsc().
The problem I am facing is that the latency of that operation or number of cycles taken by that operation remains the same even when I change the frequency of the processor core from 3 Ghz to 2 Ghz. In other words there is no effect on output of rdtsc when I change the frequency
Can anyone please tell me why this is happening.

Thank You.

_mm_lddqu_si128 and _mm_loadu_si128

Hi, I would like to ask how much improvement I can get by replacing _mm_loadu_si128 by _mm_lddqu_si128 on a 64-bit machine. I wrote a simple program and tried to see the difference between these two load instructions but I could not see any improvement at all. According to my understanding, _mm_lddqu_si128 takes care of unaligned data loading better than _mm_loadu_si128. The following in my test code.

Difference between L2 cache misses and Bus_Trans_Mem


I have a small doubt as to how to find out the number of main memroy accesses from a performacne counter.
L2 cache misses all access the memory but there is a performance counter Bus_Trans_Mem also.
Can anyone please tell me the difference between them and the better one to estimate the number of memory accesses.

Thank You,

Cache memory

Hello sir/madam,

Am studying "B.Tech Information Technology", Am in a hurry of selecting my final year projects.
An idea came to my mind that performing cache memory in a circular queueing method, means not the last accessed data in cache but last 10 fetched data in circular queue. so that every one wil access last 10 fetched data in cache.

pls tell me whether its possible or not and teach me how to access cache memory.

edx and rdx on 64-bit machine

Hi, I have questions regarding the usage of edx and rdx registers on a 64-bit machine. How these two registers relate to each other? For example, if I want to do subtraction:

sub edx, 16

This instruction subtracts 16 from edx, and it seems the value in rdx would also be subtracted by 16. Why?

Also, when I use gdb and try to print out the value inside edx, it always shows "void". Is there a way to see the content inside edx register on 64-bit machine? Any comments or answers are highly appreciated. Thank you in advance!



probable mistake in documentation---please check3

Intel software developer's manual, 3A, System Programming Guide:
6.14.2 64-Bit Mode Stack Frame
In IA-32e mode, the RSP is aligned to a 16-byte boundary before pushing the stack
frame. The stack frame itself is aligned on a 16-byte boundary when the interrupt
handler is called. The processor can arbitrarily realign the new RSP on interrupts
because the previous (possibly unaligned) RSP is unconditionally saved on the newly

Subscribe to Intel ISA Extensions