Possible rdtsc bug

Possible rdtsc bug

martlau1978's picture

I'm running the following program on a Dell Inspiron Mini 10 (Atom Z520).

int main(void)
{
while(1)
{
unsigned __int64 time;
_asm rdtsc
_asm mov dword ptr [ time + 0 ], eax
_asm mov dword ptr [ time + 4 ], edx
printf("\n%I64X", time);
Sleep(1000);
}

stdout:
=======
674EE7002254
674F2EB961BE
674F2BE098F4 *
674F46D3B240
674F7557DCCC
674F7100E0CE *
674F8B98AC14
674F9FC81058
674FB05DBC10
674FBE4559B4
674FDE1093EE
674FF73CB000
675022167C70
67504F43C644

* Time is going backwards.

12 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
martlau1978's picture
The speed of the tsc is dependant on CPU work load: I changed the Sleep(1000) in the loop to keep CPU usage at 100% (I used an AES256 encryption function, not that this should matter). The left column is the rdtsc value (hex). The right value is the difference (in decimal). The long term rate of increase is constant, and seems valid. An error of ~0x1B300000 seems to appear and disappear over time.

68A0A3168292 1328875720
68A0F25BF742 1329951920
68A141A16990 1329951310
68A175B39492 873605890
68A1C4FA2B9C 1330026250
68A22F74D828 1786424460
68A27EBCEABA 1330123410
68A2B2D12956 873741980
68A31D49DA30 1786294490
68A36C9278AE 1330159230
68A3BBD50F80 1329764050
68A40B1BA9E6 1330027110
68A45A61EE20 1330005050
68A4A9A884B2 1330026130
68A4F8EEDEFE 1330010700
68A52D017404 873633030
68A5977D3552 1786495310
68A5E6C497C2 1330078320
68A6360BDAF2 1330070320
68A66A1B18BC 873414090
68A6B9642454 1330187160
68A708A8473E 1329865450
68A77323005E 1786427680

Quoting - martlau1978 I'm running the following program on a Dell Inspiron Mini 10 (Atom Z520).

int main(void)
{
while(1)
{
unsigned __int64 time;
_asm rdtsc
_asm mov dword ptr [ time + 0 ], eax
_asm mov dword ptr [ time + 4 ], edx
printf("n%I64X", time);
Sleep(1000);
}

stdout:
=======
674EE7002254
674F2EB961BE
674F2BE098F4 *
674F46D3B240
674F7557DCCC
674F7100E0CE *
674F8B98AC14
674F9FC81058
674FB05DBC10
674FBE4559B4
674FDE1093EE
674FF73CB000
675022167C70
67504F43C644

* Time is going backwards.

robert-mueller-albrecht (Intel)'s picture
Hi Mart,

since your report seems to point at a possible assembly or even microcode isue, I assume that you are not reporting a problem with the software development tool suites for the Intel Atom Processor.

I forwarded your sighting to some colleagues in the hardware performance teams. I'll get back to you as soon as I find out more.

Thanks, Rob

martlau1978's picture
Hi Rob,
I was directed here by Sergio from online chat support. I'm hoping to be redirected the right forum/place... thanks!

Martin

Quoting - Robert MuellerAlbrecht (Intel)

Hi Mart,

since your report seems to point at a possible assembly or even microcode isue, I assume that you are not reporting a problem with the software development tool suites for the Intel Atom Processor.

I forwarded your sighting to some colleagues in the hardware performance teams. I'll get back to you as soon as I find out more.

Thanks, Rob

robert-mueller-albrecht (Intel)'s picture

Hi Martin,

I had a few more exchanges with or Intel Atom Processor core performance team. They are treating this as a possible hardware sighting and are tracking it and trying to reproduce it.

Could you try and providea bit more input as to

which exact CPU power state the sleep setting on your system relatesto? Would this be C4?
Is it possible for you to provide us with the exact CPU chip ID. Which microcode version or updates and BIOS version are running on the system?

We were not able to reproduce your sighting on a standard Z510 with a gcc compiled binary running for several hours (with and without the 1 second delay).

Knowing the exact nature of the power mode switch and the underlying hardware details may be key.

I know you provided us with the basic code snippet to reproduce - since it doesn't show up on our verification systems - do you think you could provide us with the binary you use for testing?

Thanks, Rob

martlau1978's picture

Hi Robert,
I use the BIOS provided by Dell. They provided this file to install it: TigerA03.exe .
This is the first page of the BIOS Utility (F2 when booting):

Dell Inc. Phoenix SecureCore Setup utility
BIOS Version: A03
CPU Type: Intel Atom CPU Z520
CPU Speed: 1330 MHz
CPU Cache Size: 512 KB
CPU ID: 106C2
Product Name: Inspiron 1010

I'm not sure how to change the power mode. In the BIOS, there is a setting called
"IntelSpeedStep Techonology". I tried both Enabled and Disabled modes and
I got the same behavior. If this is not what you meant, please direct me to the
appropriate place.

When I run the cpuid instruction with parameter 1, I get:
eax = 0x00106C2
ebx = 0x0020800

Unfortunately, I only have one Inspiron Mini 1010 here, so I can't check if this is a manufacturing defect or a design flaw. I've ordered two more, but the delivery is scheduled for the second week of June. These newer Inspiron 1011 come with a different processor: the Atom N270 1.6 GHz.

How I can send you my application?

Thanks,
Martin

robert-mueller-albrecht (Intel)'s picture
Hi Martin,

the feedback I got from our hardware team by now is that this is an "old" known issue that is fixed in patch 20A.

Let me check whether they have some insight for you where you can get that patch as well.

Rob

robert-mueller-albrecht (Intel)'s picture
Hi Martin,

our hardware and firmware folks confirm that what you need most likelty is a BIOS update patch to version 20A or above. DELL should hopefully be able to provide this patch to you.

If you would like I can email you a little check utility that verifies the BIOS version on your system. I assume the email address in your profile would work?

Thanks, Rob

martlau1978's picture

OK. I'll contact Dell. My email address should work fine. Please send me this utility.

Thank you very much for your help.
Martin

martlau1978's picture
Hi Robert,
I contacted Dell, and they're going to "escalate" this forum thread to their BIOS group. They can't give me a timeline for the update, so I'll tag this thread as resolved, with many thanks to you. When (if?) the BIOS group delivers a new version, I'll tell you if this fixed my problem.

Thanks again,
Martin

martlau1978's picture

My Dell BIOS does not support disabling hyper threading for some reason, so I could not try your suggestion directly. However, using the APIC field from the cpuid instruction, I was able to see what is going on. The third value listed is the ebx of cpuid(1) instruction (which is run right before rdtsc). It's now obvious that Windows is scheduling the thread on two CPUs (for some reason, it does this after some boots and not after others).

54891B575A 1321055690 01020800
54D860301A 1329912000 01020800
55432A36CA 1791624880 00020800
5592708AEA 1330009120 00020800
55E1B6DE56 1330008940 00020800
5630FD4D9C 1330016070 00020800
5680438B3C 1330003360 00020800
56CF8AD7C2 1330072710 00020800
571ED04B00 1329951550 00020800
5752935E0A 868422410 01020800
57BD5D0A92 1791601800 00020800
57F11FDCD8 868405830 01020800
5840664192 1330013370 01020800
588FAC893C 1330005930 01020800
58DEF3C1EA 1330067630 01020800

The tsc counters of both hyper-threaded CPUs are increasing in lock step. However, the tsc of CPU 1 is starting off with a delay of ~461552170 clocks. This behavior does not occur on Pentium 4 processors with hyper-threading, making profiling work even with hyper-threading enabled and Windows scheduling the threads on different processors over time. I looked at the cycle difference between the processors on the Pentium 4 and it is very, very small. Perhaps there is only one counter(?).

There are several workarounds that I can use to get around this problem with the Atom.

The easiest is to restrict Windows to a single processor. This can be done from the msconfig application, or manually in the boot.ini. This will reduce performance somewhat. I tried it, and this fixes the problem, as one would expect.

I could also measure the difference between the two processors' timestamp (it seems to be constant after boot), and apply the constant correction of ~461552170 clocks to CPU 1 (using cpuid). There are a few serialization issues here which are not fun, but they can be addressed either in a statistical fashion (ie. checking cpuid before and after rdtsc) or with proper serialization in a driver.

If solution 2 could be implemented in the rdtsc microcode, this would be the simplest solution for customers, as all serialization issues would be solved, and programs that work on Pentium 4 HT would work on Atom HT without modification.

robert-mueller-albrecht (Intel)'s picture
Hi Martin,

this really is great news.
Is it possible for you to use the C funstion clock instead of rdtsc? I think that should do almost exactly what you want assuming its not too slow.

Well do some more poking around here to see if this is a hardware issue or not. However, please note we believe right that is allowed for one thread to write the TSC (by using wrmsr) and hence it is allowed for the two TSCs to be offset from each other. The cpu may not be able to prevent that. We'll check some more.

Rob

Login to leave a comment.