Invariant TSC support

Invariant TSC support

Bryan Hickman (Intel)'s picture

I was told that on the Xeon CPU line, the X5690 supports invariant TSC and none of the E7 and above do. Is that true?

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Maxym Dmytrychenko (Intel)'s picture

Just note that you can always check on such feature during the execution as described below:

17.12.1 Invariant TSC

The time stamp counter in newer processors may support an enhancement, referred

to as invariant TSC. Processors support for invariant TSC is indicated by

CPUID.80000007H:EDX[8].

The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is

the architectural behavior moving forward. On processors with invariant TSC

support, the OS may use the TSC for wall clock timer services (instead of ACPI or

HPET timers). TSC reads are much more efficient and do not incur the overhead

associated with a ring transition or access to a platform resource.

and more available at: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-manual-325462-rmver.html

Roman Dementiev (Intel)'s picture

under Linux there is an open source cpuid utility. On my test system withIntel Xeon CPU E7- 4830 I got this output:

   Advanced Power Management Features (0x80000007/edx):
      temperature sensing diode      = false
      frequency ID (FID) control     = false
      voltage ID (VID) control       = false
      thermal trip (TTP)             = false
      thermal monitor            = false
      software thermal control (STC) = false
      100 MHz multiplier control     = false
      hardware P-State control       = false
      TscInvariant                   = true

Best regards, Roman

Martin Dixon (Intel)'s picture
The invariant TSC means that the TSC continues at a fixed rate regardless of the C-state or frequency of the processor (as long as the processor remains in the ACPI S0 state). This is indicated by the CPUID.80000007.EDX[8].

All E7 parts will support the invariant TSC (short of an unexpected erratum). Here's an excerpt from /proc/cpuinfo on a pre-production Westmere EX processor. vendor_id : GenuineIntel cpu family : 6 model : 47 stepping : 2 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx smx est tm2 cx16 xtpr popcnt lahf_lm As you can see this part indicates nonstop_tsc. BTW: TSC invariance does not imply cross-socket synchronization. That requires the platform vendor to distribute RESET synchronously to all sockets.

Chris M.'s picture

Hi, has anyone been able to used a cpuid utility to determine whether invariant TSC is enabled on windows?

Uday J.'s picture

Does Invariant TSC this mean it TSC will be constant across multiple sockets.

My system is 

model name      :       Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz

But I still see TSC variations across sockets.

Any idea???

iliyapolak's picture

I suppose that TSC reading is pinned to single socket.

Btw read @Martin Dixon answer particularly the last sentence.

Tim Prince's picture

Even though a reset is distributed in an effort to start tsc simultaneously, they may not be fully synchonized. So the application may find an advantage in taking tsc differences only among readings on the same socket.

John D. McCalpin's picture

For newer processors (at least since Nehalem) there is also an RDTSCP instruction that returns the time stamp counter plus the contents of an additional core-specific register.  Linux systems (at least since 2.6.24 or so?) set this additional register to contain the socket number and the core number of the processor core that executed the RDTSCP instruction.  The hardware guarantees that the value read from the TSC and the value read from this additional register are done atomically, so it guarantees that you know exactly which core provided the TSC value.

The RDTSCP instruction reads the same TSC as the RDTSC instruction, so if RDTSC is invariant, then RDTSCP will be as well.

RDTSCP is slightly more ordered than RDTSC.  RDTSC is not ordered at all, which means that it will execute some time in the out-of-order window of the processor, which may be before or after the instruction(s) that you are interested in timing.   RDTSCP will not execute until all prior instructions (in program order) have executed.  So it can't execute "early", but there is no guarantee that the execution won't be delayed until after some subsequent (in program order) instructions have executed.  In practice I have never seen a problem with hardware reordering of either of these instructions -- most processors tend to execute in FIFO order most of the time, and since these instructions have no input dependencies they tend to get executed pretty close to where they sit in program order.   It is hard to tell how long they really require to execute because they are designed to provide monotonically increasing values.  The (invariant) TSC is incremented by the base multiplier once every reference clock, so on my Xeon E5-2680 (Sandy Bridge EP) this is an increment of 27 every 10 ns.   The only way to avoid getting the same result (which would result in a time difference of zero) is to make sure that the instruction takes at least 10 ns to execute.   This is 27 cycles at 2.7 GHz and 31 cycles at the Turbo speed of 3.1 GHz.   In practice it takes a few more cycles for RDTSCP since it returns an extra value, and extra cycles are required to store the results.

RDTSCP is also by far the easiest way to determine which socket and which core a process is running on, since most systems allow user-mode execution of the RDTSCP instruction.   You can stick it in a simple inline assembler macro and get the TSC, the processor number, and the socket number with an overhead of O(50) cycles.

The version I use is:

unsigned long tacc_rdtscp(int *chip, int *core)

	{

	   unsigned long int x;

	   unsigned a, d, c;
   __asm__ volatile("rdtscp" : "=a" (a), "=d" (d), "=c" (c));

	    *chip = (c & 0xFFF000)>>12;

	    *core = c & 0xFFF;
   return ((unsigned long)a) | (((unsigned long)d) << 32);;

	}

John D. McCalpin, PhD "Dr. Bandwidth"
bronxzv's picture

Цитата:

John D. McCalpin wrote:

The version I use is:

unsigned long tacc_rdtscp(int *chip, int *core)

	{

	   unsigned long int x;

	   unsigned a, d, c;
   __asm__ volatile("rdtscp" : "=a" (a), "=d" (d), "=c" (c));

	    *chip = (c & 0xFFF000)>>12;

	    *core = c & 0xFFF;
   return ((unsigned long)a) | (((unsigned long)d) << 32);;

	}

with the Intel compiler one can simply use the __rdtscp intrinsic for this purpose

  Synopsis unsigned __int64 __rdtscp (unsigned int * mem_addr)
#include "immintrin.h"
Instruction: rdtscp
CPUID Flag : RDTSCP Description Copy the current 64-bit value of the processor's time-stamp counter into dst, and store the IA32_TSC_AUX MSR (signature value) into memory at mem_addr. Operation dst[63:0] := TimeStampCounter MEM[mem_addr+31:mem_addr] := IA32_TSC_AUX[31:0]

 

John D. McCalpin's picture

The compiler directive is fine, of course, but I built my own version so that it would decode the socket and core information that Linux puts in the auxiliary register without me needing to remember how the bits are packed.

I have no idea if Windows puts anything in the TSC_AUX MSRs.
 

John D. McCalpin, PhD "Dr. Bandwidth"

Login to leave a comment.