I would like to know about a specific MSR. I have been looking at 0x19c to see if throttling is occurring. If I use the rdmsr utility on Linux like "rdmsr -c -p0 0x19c", I get a result similar to "0x881f0008". If I bitwise-and the result with 1, I get the thermally throttled state (1=throttled, 0=normal). I looked at the Linux kernel source code and saw that the value is labeled as THERM_STATUS_PROCHOT for bit 0.
We have had other throttling issues and want to know what bit 2 (i.e. 1<<2) corresponds to. Empirical evidence shows that the processor is indeed throttled and a run of Linpack xhpl will have a slower benchmark number. Am I off-base here in thinking that this bit represents some other form of throttling? This seems to be an extremely reliable indicator of "slower" performing servers.
Also, I know that a change in the thermal throttling state results in an interrupt. Does this kind of throttling (bit 2) also result in an interrupt that we can catch and log?
Based on my observations it seems it may be triggered by something in the Dell M1000e chassis that we have. We have many of these and whenever we do a firmware upgrade on the chassis, the CPUs will show as throttled for a minute or so. There are also occassions where processors get throttled for no good reason at all as far as we can tell.
I could describe it in much greater detail here, but I wrote a very lengthy article about it at http://tech.ryancox.net/2010/11/diagnosing-throttled-or-slow-systems.html