(update) C-states, C-states and even more C-states

As I said before, a C-state is an idle state. The processor isn't doing anything useful, so why not shut some things off? Think of it in terms of your house. If you're not at home, why keep the lights, radio, and those 6 televisions going? Modern processors have several different C-states representing increasing amounts of "stuff" shut down. C0 is the operational state, meaning that the CPU is doing useful work. C1 is the first idle state. The clock running to the processor is gated, i.e. the clock is prevented from reaching the core, effectively shutting it down in an operational sense. C2 is the 2nd idle state. The external I/O Controller Hub blocks interrupts to the processor. And so on with C3, C4, etc. I'll discuss this further down in this paper. By the way, there is nothing preventing the OS from busy waiting in its idle state, and thus keeping the processor in C0, as did older operating systems. From the OS's standpoint, the processor is idling; it's just chewing up energy for no useful reason other than being an ineffectual heater.

So what's this thing about "C-states, C-states and even more C-states"? It turns out that there are different kinds of C-states depending upon what part of your system you are talking about. There are core C-states, processor C-states, and OS C-states. All are similar and are idle states (I'm excluding C0, of course.) They are also different in some substantial ways.

A core C-state is a hardware C-state. There are several core idle states, e.g. CC1 and CC3. As we know, a modern state of the art processor has multiple cores, such as the recently released Core Duo T5000/T7000 mobile processors, known as Penryn in some circles. What we used to think of as a CPU / processor, actually has multiple general purpose CPUs in side of it. The Intel Core Duo has 2 cores in the processor chip. The Intel Core-2 Quad has 4 such cores per processor chip. Each of these cores has its own idle state. This makes sense as one core might be idle while another is hard at work on a thread. So a core C-state is the idle state of one of those cores.

A processor C-state is related to a core C-state. At some point, cores share resources, e.g. the L2 cache or the clock generators. When one idle core, say core 0, is ready to enter CC3 but the other, say core 1, is still in C0, we don't what the fact that core 0 is ready to descend into CC3 to prevent core 1 from executing because we just happened to shut down the clock generators. Thus we have the processor / package C-state, or PC-state. The processor can only enter a PC-state, say PC3, if both cores are ready to enter that CC-state, e.g both cores are ready to step into CC3. I'll talk more about this in a subsequent section.

A logical C-state: The last C-state is the OS's view of the processors' C-states. In Windows, a processor's C-state is pretty much equivalent to a core C-state. In fact, the OS's lower level power management software determines when and if a given core enters a given CC-state using the MWAIT instruction. There is one important difference. When an application, such as Intel's PowerInformer, thinks it's interrogating a processor core CC-state, what is returned is the C-state of what is called a "logical core". (A logical core is technically not the same as a physical core. In my experience, a logical core is almost always the same as a physical core, but it doesn't have to be.) Logical cores don't have to worry about little things such as the hardware the OS is running on. For example, the C-state of a logical core doesn't worry about the barriers imposed by shared resources, such as the clock generators, I talked about earlier. Logical Core 0 can be in C3 while Logical Core 1 is in C0.

This seems a little confusing doesn't it? So how do logical core C-states, core C-states and processor C-states relate to each other? Take the situation above: From the OS perspective, logical core 0 is in C3 and logical core 1 is in C0. Since C3, from the hardware perspective, actually shuts down a shared process, the clock generators, (physical) core 0 must be held at CC2 since core 1 is in C0 and using the clock generators. The processor, in a global sense, is not idle since core 1 is in C0, so the processor's C-state is C0. To use a little bit of that intimidating mathematics,

Processor C-state = Min(core C-states)


Core C-state = Minimum barrier(set of all logical C-states)


Logical C-state = anything the OS wants



Next: There has got to be a catch
For more complete information about compiler optimizations, see our Optimization Notice.

8 comments

Top
anonymous's picture

Hi Taylor,

I have observed the C-States via MS Windos Performance Monitor application, and sometimes the application shows C1 time was over than 100%, is it correct.

I would like to download some document form Intel. Can you tell me the download address?'

I will be appreciated if you can help me.

anonymous's picture

I'm just interested in how INtel Power Informer computes it C3 residency. It's help files I believe state C3 Time divided by the number of C3 transitions. If that's the case, I imagine the formula will be C3 Resicendy = (CPU time in C3 / C3 transitions) X 100. As you call it Logical C3.

One more thing. Performance Monitor can take C3 time for multiple instances (multiple cores). Are you implyng this reading of total instances is wrong?

anonymous's picture

Hi Terence,

If you haven't read it, I suggest you look at the next blog entry, "There's got to be a catch." (http://software.intel.com/en-us/blogs/2008/04/29/theres-got-to-be-a-catch/)

I'm not a HW architect or expert, but I have a general idea of how SW (OS) controls the HW (C-states).

It's a typical interrupt driven affair. I suspect that there is at least one interrupt routine for processing C-state events. Here's my guess at how this stuff works. After some minimal threshold of idle time, the OS PM engine is invoked by the OS scheduling engine. This PM routine uses an MWAIT(Cx) instruction (via a driver) to force the HW into the Cx state. At some point, the HW detects an event that needs to be evaluated by SW. The HW invokes an interrupt routine. This interrupt routine interrogates why it was invoked and passes the information to the OS PM engine. This engine evaluates whether or not to go back to sleep or continue to process the event. If it determines that the event doesn't need to be processed, it issues another MWAIT(Cx) instruction, forcing the processor to enter the Cx state.

I'm 100% sure that MWAIT() is a privileged instruction, meaning that it isn't directly accessible to user space applications. This is for good reason as I'm sure you can guess. ;-)

If you do a search on "MWAIT", you'll get a lot of discussions in LINUX threads discussing how to implement kernel level PM code to meet ACPI requirements.

--Taylor

anonymous's picture

It would be great if you explained how to adjust C-states under Linux, ideally via a user-space utility.

In some cases the human user or an ordinary application wants to manually adjust C-states on some cores/processors.

I have been trying to figure out how to do this but have failed thus far.

Thanks!

anonymous's picture

I'm not familiar with the c-state residency monitor utility. I recommend that you talk to your friendly neighborhood Intel agent.

You can't add C5 to perfmon. Perfmon is a general MS Windows utility that operates on many processors. I don't believe that ACPI defines anything above C4. (Please correct me if I'm wrong.) Higher C-states are processor specific, e.g. the Intel Penryn line.

This means that MS Windows doesn't understand anything above C4 since it is a general OS supported on a variety of platforms. Even at that, there are some additional details about how MS Windows sees C states. C-states are defined in the ACPI spec in a relatively general way. This makes sense as different processor architectures are going to have their own solutions to power management.

As I discussed above, MS Windows only defines "logical C-states". This means perfmon only recognizes C4 and below. If you want access to C-states higher than C4, you're going to have to make arrangements with Intel. You know, sign all those annoying non-disclosure and other such agreements. (Before someone asks me how to obtain such an agreement, I have no idea. You'll have to contact some technical Intel rep.)

--Taylor

anonymous's picture

do you know why i can not find any executable files after in installed "Intel C state Residency monitor utility".

and i can not add C5 state in perfmon.msc .

any one could told me ?

anonymous's picture

Hello, This is a very nice article!

Another thing need your advice. As I remember, there have a tool from Intel call "Intel C state Residency monitor utility" which can monitor C state via perfmon.msc on Vista.

Is it a freeware? If yes, where can I download it? I cannot find it from Intel search tools...

I will be appreciated if you can help me.

murphyjim87's picture

Nice. Thanks for the helpful explanation.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.