HowTo MSR for Turbo Ratios ?

HowTo MSR for Turbo Ratios ?

Hello,

My source code ZFreq.c displays the frequencies of the i7 cores

I'm using the MSR registers to read the core ratios multiplied by the current external clock from the SMBIOS.

However whatever the system load is, the MSR IA32_PERF_STATUS never returns the values found in the turbo zone given by MSR_TURBO_RATIO_LIMIT.

To be short IA32_PERF_STATUS never goes above MSR_PLATFORM_INFO.MaxNonTurboRatio

Please help me to program correctly those MSR

 

Thank You

CyrIng

Fr

20 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Source code also available at code.cyring.fr/FTS/Source/C/zfreq.c

-;)

John D. McCalpin's picture

The clock ratio that you obtain depends on the model number of the part, the number of processors active, the processor temperature, the processor power consumption, the power current draw, and some other factors that Intel does not describe in a lot of detail.  

To look into this further, it would really help to have the exact processor model number.   If you look at the Wikipedia page on Core i7 processors, you will see that there are 9 different implementations that are referred to as "Desktop Core i7" processors.  This includes Nehalem, Westmere, Sandy Bridge, Ivy Bridge, or Haswell cores -- and four of these five have two different options for the uncore.   In addition, there are 10 different implementations that are referred to as "Mobile Core i7" processors.  This also includes Nehalem, Westmere, Sandy Bridge, Ivy Bridge, and Haswell cores, with three of the five cores being associated with different uncore or packaging options.

One reason that this matters is that, even though the processor MSRs to control the frequency ratio request may be the same, different processors have different support for other features that may be helpful in understanding why the processor is behaving as it does.  For example, the Core i7 processors based on the "Sandy Bridge E" (and probably "Ivy Bridge E") should use the same interface to the uncore "Performance Control Unit" as the "Sandy Bridge EP" (Xeon E5-1600/2400/2600/4600) server chips.  For those processors you can query registers in the Performance Control Unit to find out why the requested frequency ratio has not been granted.

 

John D. McCalpin, PhD "Dr. Bandwidth"

Hi, Thanks for helping.

So I have progressed by implementing the performance counters MSR. A lot of fun !

This second release of the source code is tested on my Bloomfield i7-920 with a BCLK overclocked to 160 MHz ; Nehalem architecture, and may run with successors.

The frequency based on unhalted cycles shows some very interresting values : sometimes, below the minimium ratio, rarely, above the maximum one.

But it may happen, meaning perhaps that turbo is "furtive". Thus, to catch it, I'm displaying turbo bumps on a quater scale.

However I still don't reach a display such as the Intel Widget for Windows does.

I guess the key is to optimize the counter readings. For instance, a tiny thread loop with no output.

CyrIng

Patrick Fay (Intel)'s picture

Hello Cyring,

Sorry to delay responding. I've been busy doing end-of-year, start-of-year work.

In your program, you are only setting the 'count OS cycles' bit for the fixed counters (unless I'm mistaken). So you are only going to count unhalted reference cycles and unhalted core cycles if your measurement program is running in ring0... and I kind of doubt that you are running at ring0.

But it is nice code though...

Pat

Thanks alot for your advices -;)

Meanwhile I have progress with C3 and C6 states that I have already implemented in a bigger project.
You may check my Blog or Sourceforge for screenshots and the source code of the Xlib Widgets.

However, turbo ratios still don't show up with relative frequencies based on C-States and TSC.

As you said working in Ring0 should be the clue.

Best Regards

CyrIng

 

Hello,

I have blog my formula to compute Turbo Ratio :

Ratio = OR × { d(URC) ÷ d(TSC) } + TR

It gives some good results, even in Ring3

Please let me know if you find it correct.

 

CyrIng

Patrick Fay (Intel)'s picture

Hello Cyring,

I'm not sure what this is really calculating. It looks like some kind of add on to the regular turbo ratio. TR is defined as unhalted_core_clks/unhalted_ref_clks so if you are running at TSC freq then TR=1.

Your 'Ratio = OR × { d(URC) ÷ d(TSC) } + TR' then basically RATIO = operating_freq_ratio * %unhalted + TR. So if you are running at TSC freq, say 2.0 GHz with no halting then Ratio = 2.0*(1) + 1 = 2 ... unless I'm doing something wrong.

But a more fundamental issue with this approach is using the current frequency from (IA32_PERF_STATUS) and trying to say that the instantaneous IA32_PERF_STATUS tells you something about average frequency.

On Haswell for instance, going into or out of halting can take .32 useconds (0.32e-6 secs). Going into turbo can take 0.1usecs. See http://www.anandtech.com/show/7744/intel-reveals-new-haswell-details-at-isscc-2014 . So if we took 0.1usecs as the smallest 'window' then we could have maybe 10,000,000 changes in frequency per second and you are looking at maybe 1 of those changes (assuming you are just reading IA32_PERF_STATUS once per second). Do you see what I'm trying to say?

Pat

Hello Pat,

Thank you for your reply.

Reading MSR_TURBO_RATIO_LIMIT (0x1ad) returns the following Turbo Ratio Values

MaxRatio_1C=22 ; MaxRatio_2C=21 ; MaxRatio_3C=21 ; MaxRatio_4C=21

Reading MSR_PLATFORM_INFO (0xce) returns a MinimumRatio of 12 and a MaxNonTurboRatio of 20

My issue with a Core i7-920 is that it should have 2 bumps when one Core (and only  Core) is loaded : meaning 22

Looping every 1sec to get current ratio from IA32_PERF_STATUS returns a ratio with only two possible values : 12 or 20

As you noticed : OR x { d(URC) ÷ d(TSC) } remains OR .

Thus never above 20, except if I add the remaining States { d(UCC) ÷ d(URC) } which gives a ratio up to [ 21.0 - 22.0 ]

I share your point of view that a smaller sample must be taken into account : this is reserved for a future ring0 driver.

Meanwhile I would like to be sure of the good formula and the registers associated with.

CyrIng

Hello

nmi_watchdog is a bad boy. It uses the counters as soon as Linux boots. Thank you Pat for this info.
The kernel modules which enable it were blacklisted in /etc/modprobe.d/modprobe.conf

blacklist iTCO_vendor_support
blacklist iTCO_wdt

and verified in  /proc/sys/kernel/nmi_watchdog with a 0 value

Another thing I have observed when tracing Unhalted Core Clocks is that the counter can go "backward", even if I take care of 64 bits overflow.
Meanwhile, I found some answers in The accuracy of the performance counter statisitics .

To my understanding, those variations of UCC are explained by events such as Interrupts, Throttling, Instruction serialization.
However, can this also show some kind of the Turbo activity ?
 

Best regards

CyrIng

Patrick Fay (Intel)'s picture

Hello Cyring,

The counter should only 'go backward' if the counter has overflowed or some piece of code has reset the counter.

The counter isn't 64bits wide but probably 48 bits. You can get the fixed counter width from cpuid, input 0xa, output bits eax[16:23].

What are you trying to measure? It should be easy to measure turbo mode frequency. If you are still not showing that you are getting into turbo mode then there are several possibilities. Your chip may have turbo mode disabled, perhaps in the bios, or I think I've seen some low power chips where turbo is hardwired off (but the chip spec sheet will say this). Or the OS may be disabling turbo mode, usually due to using a 'favor power savings over performance' power plan. On some Windows versions, it seems like the 'Balanced' power plan disabled turbo.

Requirements for turbo are that

1) the frequency be allowed to go to max non-turbo frequency (see /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq)... the requirement might actually be that the freq be allowed to go to > than max non-turbo freq (see /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies). Setting scaling_max_freq and scaling_min_freq allows you to control the cpu frequency (to one of the allowed frequencies).

2) that cpuid.input(0x6).ouput(eax[1]) == 1,

3) MSR 0x1a0 IA32_MISC_ENABLE bit 38 be == 0 (this bit is usually controlled by bios)

4) MSR 0x199 IA32_PERF_CTL bit 32 be == 0. This bit may be changed by the OS. If the OS doesn't want to allow turbo then it can set this bit.

You can see what the max turbo freq is for 1 core by looking at MSR_TURBO_RATIO_LIMIT[0:7].

Now, if all of the above permits turbo, you can still not get into turbo mode if the power limit is exceeded or the thermals don't permit it (chip too hot).

But lets say that everything is permitting turbo. Then you should be able to write a simple, single-threaded spinner (just spin for x seconds) program and pin it to 1 cpu, with the rest of the system idle, and the turbo ratio should show that you are hitting MSR_TURBO_RATIO_LIMIT[0:7].

Pat

Thanks Pat for these instructions.

I have all the requirements for Turbo gathered. Some Windows tools show Turbo is working fine (such as T-Monitor)

My code is made for Linux, and I have blacklisted the cpufreq module, however Turbo feature is enabled in cpuid and activated in MISC_PROC_FEATURES[38] as shown bellow.

Measuring Cycle Delta with idle then high load on 1 core give me the following values ,
( where columns are in this order UCC:URC C3 C6 / TSC )

IDLE

HIGH

I don't understand which MSR registers can give me a ratio hitting MSR_TURBO_RATIO_LIMIT[0:7] which is btw 0010110 in my screenshot.

Patrick Fay (Intel)'s picture

So the max turbo freq for 1 core is 2.2 GHz and you seem to be running at 2.6 Ghz. I guess you are overclocking the CPU?

I'm not sure how turbo mode behaves when you overclock. I'm guessing that the cpu sees the freq is already > 2.2 GHz and doesn't try to turbo boost.

I would not advise blacklisting kernel modules unless you really, really know what you are doing. I've always found that messing with the nmi_watchdog file controlled the watchdog.

Pat

CPU overclocking is not enable, (beside the 3 Corsair DDR memories pushed to 1600 MHz).

In BIOS, BCLK is set to 133, Ratio to auto (so between 12 and 20) so system is running @ 20 x 133 MHz

How did you compute a max freq of 2.2 GHz for 1 core ?

Patrick Fay (Intel)'s picture

Sorry, I assumed that the bus freq (bclk) was 100 MHz.

Pat

Indeed, Monitoring Counters are 48 bits width.

Thanks for this.

CyrIng

Hello,

Is this formula correct to display per logical core its non halted activity including turbo

            DisplayRatio=TurboRatio x State(C0) * MaxNonTurboRatio
              where
                  TurboRatio=Delta(UCC) / Delta(URC)
              and State(C0)=Delta(URC) / Delta(TSC)
              and MaxNonTurboRatio=MSR_PLATFORM_INFO[15-8]

 

Patrick Fay (Intel)'s picture
Best Reply

Hello Cyring,

It depends on what you mean by 'per logical core its non halted activity'.

Usually I look at the 2 fields separately.

1) average non-halted frequency over the interval = TSC_frequency * delta(CPU_CLK_UNHALTED.THREAD) / delta(CPU_CLK_UNHALTED.REF)

2) %of time cpu is unhalted = 100 * delta(CPU_CLK_UNHALTED.REF)/delta(TSC)

Item 1) tells me "when the cpu was running (not halted), what was the average frequency". Item 2) tells me "what % of time was the cpu running".

There is an article http://software.intel.com/en-us/articles/measuring-the-average-unhalted-frequency.

Pat

Hello,

Thanks a lot for your help.

Now it works as I wish : Turbo gives 2 bump.

To test it, I have made a demo Linux live CD, including the source code and the developer packages (Code::Blocks IDE)

 

CyrIng
 

 

Good day,

I'm making my program retro-compatible with any Core 2 64 bits architectures. It is split in 3 algorithms :

  1. Nehalem and above architectures, based on fixed performances counters
    step a- Initialize counters, write the MSR IA32_PERF_GLOBAL_CTRL(0x38f) and IA32_FIXED_CTR_CTRL(0x38d)
    step b- Read the MSR IA32_FIXED_CTR1(0x30a)  , IA32_FIXED_CTR2(0x30b) , IA32_TIME_STAMP_COUNTER(0x10) , MSR_CORE_C3_RESIDENCY(0x3fc) and MSR_CORE_C6_RESIDENCY(0x3fd)
    step c- Computes, displays C0, C3, C6 states
    step d- Loop to step b
     
  2. Core 2 algorithm, similar to the previous one, except that there is none MSR_CORE_C3_RESIDENCY and MSR_CORE_C6_RESIDENCY.
    step a- Initialize counters
    step b- Read the IA32_FIXED_CTR MSRs
    step c- only C0 states are taken into account.
    step d- Loop to step b
     
  3. A fallback algorithm for Genuine architectures:
    step a- Read the MSR IA32_APERF(0xe8) , IA32_MPERF(0xe7) and IA32_TIME_STAMP_COUNTER(0x10)
    step b- Computes, displays C0 states only from values read in step a
    step c- Loop to step a
     

When program starts and the processor signature detected from CPUID, one of the 3 algorithms is selected then launched.

So far, testing are like below :

 * +-------------------+---------------------------+--------+-----------------+
 * | Intel Processors  | System [Desktop/Laptop]   | Status | Algorithme      |
 * +-------------------+---------------------------+--------+-----------------+
 * | Core i7-920       | Asus Rampage II Gene [D]  |   OK   | Nehalem         |
 * +-------------------+---------------------------+--------+-----------------+
 * | Core 2 Duo T5500  | Acer Aspire 5633 [L]      |   OK   | Core 2          |
 * +-------------------+---------------------------+--------+-----------------+
 * + Core 2 Quad Q8200 | Unknown [L]               |   OK   | Genuine         |
 * +-------------------+----+----------------------+--------+-----------------+
 * + Pentium Dual Core 5700 | Acer Desktop [D]     |   KO   | Core 2          |
 * +------------------------+----------------------+--------+-----------------+
The Pentium Dual Core 5700 is detected with a CPUID 'Core2 Yorkfield' signature but the MSR IA32_FIXED_CTR1(0x30a) and IA32_FIXED_CTR2(0x30b) return a zero value.

Are there really no such fixed counters in this processor ?

Regards

CyrIng

Login to leave a comment.