***** Minimal Averaged Delta of Intel RDTSC and RDTSCP instructions *****

# Minimal Averaged Delta of Intel RDTSC and RDTSCP instructions

**[ Abstract ]**

**Time-Interval Measurements using TSC**

Intel CPU's a Time Stamp Counter ( TSC ) is a special 64-bit register that increments every

clock cycle. Two instructions, RDTSC and RDTSCP, could read a value of TSC into General Purpose

Registers ( GPR ). Intel doesn't provide any information on latencies of these two instructions,

however througputs for both instructions are given in Intel 64 and IA-32 Architectures

Optimization Reference Manual.

**[ List of Abbreviations ]**

CPU - Central Processing Unit

ILP - Instruction Level Parallelism

TSC - Time Stamp Counter ( number of clock cycles since the CPU is powered on )

GPR - General Purpose Registers

NIL - Native Internal Latency

OEL - Observed External Latency

MAD - Minimal Averaged Delta ( number of clock cycles between two calls to RDTSC or RDTSCP instructions )

DOU - Degree of Uncertainty ( unknowns related to Superscalar processing with ILP )

ATV - Absolute TSC Value

DTV - Difference TSC Value

UTV - Uncorrected TSC Value

CTV - Corrected TSC Value

**[ Details ]**

There are two point of views among Software Engineers and Computer Scientists if a latency of

RDTSC or RDTSCP instructions, officially not known, need to be taken into account when dealing

with a precise time measurements.

Here is a list of terms that will be used:

- a Native Internal Latency ( NIL ) for RDTSC and RDTSCP instructions

- an Observed External Latency ( OEL ) for RDTSC and RDTSCP instructions

- a Minimal Averaged Delta ( MAD ) for RDTSC and RDTSCP instructions

- a Degree of Uncertainty ( DUO ) of Instruction Level Parallelism of a CPU with a superscalar

architecture

NIL of RDTSC or RDTSCP instructions is a minimal number of clock cycles needed to move

a 64-bit value of TSC to EDX:EAX or RDX:RAX GPRs before the value becomes available

for an external program.

OEL is a minimal difference between two TSC values after two uninterrupted by an OS calls of

RDTSC or RDTSCP instructions and calculated as follows:

OEL = ( TSC2 - TSC1 ) * DOU

where

TSC1 = READ_TSC

TSC2 = READ_TSC

When DOU is set to 1.0 it is assumed that there is no Instruction Level Parallelism and

instructions are executed one after another. Only positive numbers are valid for DOU and

0.0 value of DOU is excluded. DOU is a very empirical number because some instructions are

designed for out-of-order execution by a CPU.

MAD is a number of clock cycles it takes to execute one RDTSC or RDTSCP instruction in

a series of calls to RDTSC or RDTSCP instructions. A series of calls of the same instruction

needs to be executed in order to fill a CPU pipeline and to retire non RDTSC or RDTSCP

instructions. MAD is calculated as follows:

MAD = ( ( TSC2 - TSC1 - SAVE_TSC1_LATENCY ) / NumOfInstructionsToFillPipeline ) * DOU

where

TSC1 = READ_TSC

TSC2 = READ_TSC

SAVE_TSC1_LATENCY is a latency of MOV instruction to save EAX or RAX GPRs

Note: EDX or RDX registers are Not saved to improve accuracy of measurements and it is possible

that overflow of values in EAX and RAX GPRs could happen.

It is a very speculative matter that a NIL of RDTSC or RDTSCP instructions is about 1-2 clock

cycles for 32-bit CPUs and 64-bit CPUs. However, it is clear that Intel CPU micro-codes should

read and move TSC value to GPRs as faster as possible.

A set of properties of a NIL could be as follows:

- NIL is always a constant for a given CPU architecture

- NIL can not be estimated externally because it is not clear how many micro-ops of a CPU are

needed to complete RDTSC or RDTSCP instructions

- NIL can not be measured externally because it is always hidden and is a part of MAD

An OEL is always higher than NIL for a given CPU and could be equal to MAD when DOU is 1.0.

A series of tests implemented in C language with some portion of codes in inline assembler are

completed and MAD values are calculated.

**[ Pseudo-code of Tests ]**

A pseudo-code of tests to evaluate MAD of RDTSC or RDTSCP instructions is as follows:

SET_PRIORITY_TO_REALTIME

TSC1 = READ_TSC

SAVE_TSC1 ;; Its latency is SAVE_TSC1_LATENCY

;; Fill CPU pipeline

RDTSC() ;; 1

RDTSC() ;; 2

RDTSC() ;; 3

RDTSC() ;; 4

RDTSC() ;; 5

RDTSC() ;; 6

RDTSC() ;; 7

RDTSC() ;; 8

RDTSC() ;; 9

RDTSC() ;; 10

;;

TSC2 = READ_TSC

MAD = ( ( TSC2 - TSC1 - SAVE_TSC1_LATENCY ) / 10 ) * DOU

SET_PRIORITY_TO_NORMAL

where

DOU = 1.0

**[ Computer Systems used for evaluations ]**

**** Dell Precision Mobile M4700 ****

Intel Core i7-3840QM ( 2.80 GHz )

Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846

32GB RAM

320GB HDD

NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory )

Windows 7 Professional 64-bit SP1

Size of L3 Cache = 8MB ( shared between all cores for data & instructions )

Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions )

Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions )

Display resolution: 1366 x 768

**** Dell Dimension 4400 ****

Intel Pentium 4 ( 1.60 GHz / 1 core )

1GB RAM

Seagate 20GB HDD ( * )

Seagate 3TB HDD ( ** )

EVGA GeForce 6200 Video Card 512MB DDR2 AGP 8x Video Card

Windows XP Professional 32-bit SP3

Size of L2 Cache = 256KB

Size of L1 Cache = 8KB

Display resolution: 1440 x 990

( * ) Seagate Barracuda 20GB IDE Hard Disk Drive

ST320011A

3.5" 7200 Rpm 2MB Cache IDE Ultra ATA100 / ATA-iV/6

Average Rotational Latency : 4.17 ms

Average Seek Times Read : 9.0ms

Average Seek Times Write : 10.0ms

Maximum Internal Transfer Rate : 69.4MB/sec

Average External Transfer Rate : 100MB/sec ( Read and Write )

Maximum External Transfer Rate : 150MB/sec ( Read )

Note: Barracuda ATA IV Family

( ** ) Seagate Barracuda 3TB IDE Hard Disk Drive

ST3000DM001

3.5" 7200 Rpm 64MB Cache SATA III ( 6GB/sec )

Average Rotational Latency : 4.16 ms

Average Seek Times Read : 8.5ms

Average Seek Times Write : 9.5ms

Maximum Internal Transfer Rate : 268MB/sec

Average External Transfer Rate : 156MB/sec ( Read and Write )

Maximum External Transfer Rate : 210MB/sec ( Read )

**[ List of tests ]**

Four tests are completed for every CPU tested with different C++ compilers:

**[ Sub-Test002.01.A - RDTSC ]** - pure C language

**[ Sub-Test002.01.B - RDTSC ]** - C language with inline assembler

**[ Sub-Test002.01.C - RDTSCP ]** - pure C language

**[ Sub-Test002.01.D - RDTSCP ]** - C language with inline assembler

Four possible use cases for **__rdtscp** intrinsic function need to be considered. The function is

declared as follows:

...

extern unsigned __int64 __ICL_INTRINCC **__rdtscp**( unsigned int * );

...

Note: Let's denote uiTscValue as **1st value**, and iRetValue as **2nd value**.

**Use Case 1** - 1st value used / 2nd value used:

...

unsigned int iRetValue = 0;

unsigned __int64 uiTscValue = **__rdtscp**( &iRetValue );

...

C++ compiler should generate ordered MOV instructions to save 1st value and 2nd value

at some addresses.

**Use Case 2** - 1st value used / 2nd value not used:

...

unsigned __int64 uiTscValue = **__rdtscp**( NULL );

...

C++ compiler should not generate MOV instructions to save 2nd value at NULL address. Currently,

Intel C++ compiler tries to save 2nd value to NULL address and Access Violation exception is generated.

**Use Case 3** - 1st value not used / 2nd value used:

...

unsigned int iRetValue = 0;

**__rdtscp**( &iRetValue );

...

C++ compiler should not generate MOV instructions to save 1st value at some address.

**Use Case 4** - 1st value not used / 2nd value not used:

...

**__rdtscp**( NULL );

...

C++ compiler should not generate MOV instructions to save 1st value and 2nd value at some addresses.

**[ CPU: Pentium 4 - Microsoft C++ compiler - 32-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Started

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

Latency of 'MOV ecx, eax' instruction is 1 clock cycle(s)

[ Sub-Test002.01.B - RDTSC ] - Completed

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Pentium 4 - Borland C++ compiler - 32-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.40 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Pentium 4 - Intel C++ compiler - 32-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 81.20 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Started

TSC Minimal Averaged Delta is 80.30 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

TSC Minimal Averaged Delta is 79.90 clock cycles

Latency of 'MOV ecx, eax' instruction is 1 clock cycle(s)

[ Sub-Test002.01.B - RDTSC ] - Completed

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Pentium 4 - MinGW C++ compiler - 32-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Pentium 4 - Watcom C++ compiler - 32-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 80.40 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

TSC Minimal Averaged Delta is 80.00 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Ivy Bridge - Microsoft C++ compiler - 32-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 25.80 clock cycles

TSC Minimal Averaged Delta is 26.20 clock cycles

TSC Minimal Averaged Delta is 27.40 clock cycles

TSC Minimal Averaged Delta is 28.20 clock cycles

TSC Minimal Averaged Delta is 26.60 clock cycles

TSC Minimal Averaged Delta is 28.20 clock cycles

TSC Minimal Averaged Delta is 26.60 clock cycles

TSC Minimal Averaged Delta is 28.60 clock cycles

TSC Minimal Averaged Delta is 28.20 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Started

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 26.70 clock cycles

TSC Minimal Averaged Delta is 26.70 clock cycles

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 26.70 clock cycles

TSC Minimal Averaged Delta is 27.50 clock cycles

TSC Minimal Averaged Delta is 26.70 clock cycles

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 26.70 clock cycles

TSC Minimal Averaged Delta is 26.70 clock cycles

Latency of 'MOV ecx, eax' instruction is 1 clock cycle(s)

[ Sub-Test002.01.B - RDTSC ] - Completed

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Ivy Bridge - Microsoft C++ compiler - 64-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 26.20 clock cycles

TSC Minimal Averaged Delta is 26.20 clock cycles

TSC Minimal Averaged Delta is 26.60 clock cycles

TSC Minimal Averaged Delta is 26.20 clock cycles

TSC Minimal Averaged Delta is 25.80 clock cycles

TSC Minimal Averaged Delta is 26.20 clock cycles

TSC Minimal Averaged Delta is 26.20 clock cycles

TSC Minimal Averaged Delta is 26.20 clock cycles

TSC Minimal Averaged Delta is 26.20 clock cycles

TSC Minimal Averaged Delta is 25.80 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Ivy Bridge - Borland C++ compiler - 32-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 25.80 clock cycles

TSC Minimal Averaged Delta is 28.30 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Ivy Bridge - Borland C++ compiler - 64-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Not Supported

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Ivy Bridge - Intel C++ compiler - 32-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 29.00 clock cycles

TSC Minimal Averaged Delta is 25.40 clock cycles

TSC Minimal Averaged Delta is 32.60 clock cycles

TSC Minimal Averaged Delta is 29.60 clock cycles

TSC Minimal Averaged Delta is 28.60 clock cycles

TSC Minimal Averaged Delta is 26.20 clock cycles

TSC Minimal Averaged Delta is 37.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 28.20 clock cycles

TSC Minimal Averaged Delta is 25.80 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Started

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 26.70 clock cycles

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 27.10 clock cycles

TSC Minimal Averaged Delta is 27.10 clock cycles

Latency of 'MOV ecx, eax' instruction is 1 clock cycle(s)

[ Sub-Test002.01.B - RDTSC ] - Completed

[ Sub-Test002.01.C - RDTSCP ] - Started

TSC Minimal Averaged Delta is 33.40 clock cycles

TSC Minimal Averaged Delta is 34.20 clock cycles

TSC Minimal Averaged Delta is 34.20 clock cycles

TSC Minimal Averaged Delta is 33.40 clock cycles

TSC Minimal Averaged Delta is 33.80 clock cycles

TSC Minimal Averaged Delta is 33.80 clock cycles

TSC Minimal Averaged Delta is 34.20 clock cycles

TSC Minimal Averaged Delta is 33.40 clock cycles

TSC Minimal Averaged Delta is 34.20 clock cycles

TSC Minimal Averaged Delta is 33.40 clock cycles

[ Sub-Test002.01.C - RDTSCP ] - Completed

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Ivy Bridge - Intel C++ compiler - 64-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Started

TSC Minimal Averaged Delta is 33.80 clock cycles

TSC Minimal Averaged Delta is 33.80 clock cycles

TSC Minimal Averaged Delta is 33.80 clock cycles

TSC Minimal Averaged Delta is 33.40 clock cycles

TSC Minimal Averaged Delta is 33.40 clock cycles

TSC Minimal Averaged Delta is 33.80 clock cycles

TSC Minimal Averaged Delta is 33.80 clock cycles

TSC Minimal Averaged Delta is 33.40 clock cycles

TSC Minimal Averaged Delta is 33.40 clock cycles

TSC Minimal Averaged Delta is 33.80 clock cycles

[ Sub-Test002.01.C - RDTSCP ] - Completed

[ Sub-Test002.01.D - RDTSCP ] - Started

TSC Minimal Averaged Delta is 34.70 clock cycles

TSC Minimal Averaged Delta is 34.70 clock cycles

TSC Minimal Averaged Delta is 34.70 clock cycles

TSC Minimal Averaged Delta is 34.70 clock cycles

TSC Minimal Averaged Delta is 34.70 clock cycles

TSC Minimal Averaged Delta is 34.70 clock cycles

TSC Minimal Averaged Delta is 34.70 clock cycles

TSC Minimal Averaged Delta is 34.30 clock cycles

TSC Minimal Averaged Delta is 34.70 clock cycles

TSC Minimal Averaged Delta is 34.30 clock cycles

Latency of 'MOV rcx, rax' instruction is 1 clock cycle(s)

[ Sub-Test002.01.D - RDTSCP ] - Completed

**[ CPU: Ivy Bridge - MinGW C++ compiler - 32-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Not Supported

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Ivy Bridge - MinGW C++ compiler - 64-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 28.20 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 25.00 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Ivy Bridge - Watcom C++ compiler - 32-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 25.00 clock cycles

TSC Minimal Averaged Delta is 25.40 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 24.60 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 25.40 clock cycles

TSC Minimal Averaged Delta is 25.40 clock cycles

TSC Minimal Averaged Delta is 26.20 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

**[ CPU: Ivy Bridge - Watcom C++ compiler - 64-bit ]**

[ Sub-Test002.01.A - RDTSC ] - Started

TSC Minimal Averaged Delta is 26.60 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 25.40 clock cycles

TSC Minimal Averaged Delta is 25.40 clock cycles

TSC Minimal Averaged Delta is 25.40 clock cycles

TSC Minimal Averaged Delta is 25.40 clock cycles

TSC Minimal Averaged Delta is 25.40 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

TSC Minimal Averaged Delta is 25.40 clock cycles

TSC Minimal Averaged Delta is 27.00 clock cycles

[ Sub-Test002.01.A - RDTSC ] - Completed

[ Sub-Test002.01.B - RDTSC ] - Not Supported

[ Sub-Test002.01.C - RDTSCP ] - Not Supported

[ Sub-Test002.01.D - RDTSCP ] - Not Supported

Examples of disassembler codes for RDTSC and RDTSCP instructions will be posted later.

**[ An example of disassembled codes for a test with RDTSC instruction - 32-bit ]**

...

0024AA47 rdtsc

0024AA49 mov ecx, eax

0024AA4B rdtsc

0024AA4D rdtsc

0024AA4F rdtsc

0024AA51 rdtsc

0024AA53 rdtsc

0024AA55 rdtsc

0024AA57 rdtsc

0024AA59 rdtsc

0024AA5B rdtsc

0024AA5D rdtsc

0024AA5F sub eax, ecx

...

**[ An example of disassembled codes for a test with RDTSCP instruction - 64-bit ]**

...

000000013F652A81 rdtscp

000000013F652A84 mov rbx, rax

000000013F652A87 rdtscp

000000013F652A8A rdtscp

000000013F652A8D rdtscp

000000013F652A90 rdtscp

000000013F652A93 rdtscp

000000013F652A96 rdtscp

000000013F652A99 rdtscp

000000013F652A9C rdtscp

000000013F652A9F rdtscp

000000013F652AA2 rdtscp

000000013F652AA5 sub rax, rbx

...

Interesting results.

Agner Fog's manuals provide different result for RDTSC throughput a bit higher than your results of latency.

Unfortunately he did not provide any data about potential CPU clock consumption of RDTSC latency.

Why do not you serialize uop of RDTSC execution?

Afaik RDTSC is not serializing instruction so in theory multiple of them can be executed at the same time and at least partially overlap pipelined execution.

>>...Agner Fog's manuals provide different result for RDTSC throughput a bit higher than your results of latency.

That is possible because it looks like he used a different generation CPU. Post these RDTSC and RDTSCP numbers for review with a CPU information.

>>Why do not you serialize uop of RDTSC execution?

>>

>>Afaik RDTSC is not serializing instruction so in theory multiple of them can be executed at the same

>>time and at least partially overlap pipelined execution.

That is why I tried to fill a CPU pipeline with at least **10** RDTSC or RDTSCP instructions.

I am posting here RDTSC reciprocal throughput result as stated by Agner Fog.

CPU Arch: Ivy Bridge , RDTSC Reciprocal Throughput: 27 CPU clock cycles.

Reference p. 175

http://www.agner.org/optimize/instruction_tables.pdf

>>>That is why I tried to fill a CPU pipeline with at least 10 RDTSC or RDTSCP instructions.>>>

I am still puzzled by at least some probable (Hardware level) pipelined execution of those 10 micro-ops. I will try to find some information at Google patents which may shed some light on proposed (patented) implementation of RDTSC instruction.

I have found an Intel patent titled "Apparatus for monitoring the performance of a microprocessor" and there is no clear information about pipelined read of TSC.

Link to aforementioned article:

https://patents.google.com/patent/US5657253A/en?q=time+stamp+counter&ass...

>>**Have you tried this experiment on v4 or v3 cpus? In particular E5-2699 v3 and E5-2699 v4?**

Here are results of my tests for **Intel Xeon Phi Processor 7210**:

http://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1...

Intel Xeon Phi Processor 7210 ( 16GB, 1.30 GHz, 64 core )

Processor name : Intel(R) Xeon Phi(TM) 7210

Packages (sockets) : 1

Cores : 64

Processors (CPUs) : 256

Cores per package : 64

Threads per core : 4

**[ Output for RDTSC instruction ]**

...

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 37.70 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 37.70 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

Access Time to TSC: 36.40 clock cycles

...