TSC, DVFS and standard sleeping function calls

TSC, DVFS and standard sleeping function calls

Hello,

We are developing a runtime system using DVFS, the TSC counter, and regular sleeping functions (usleep/nanosleep/select). We have experienced several issues with these and let me explain how we proceeded.

The TSC counter / DVFS issue:

According to the Intel Developer Guides, there are two behaviors of the TSC counter:

  • For some processor families, the time-stamp counter increments with every internal processor clock cycle, which clearly means it is sensitive to frequency scaling. (For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4 processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]); and for P6 family processors)

  • For some other processor families, the time-stamp counter incrementsat a constant rate, which is traditionally the maximum frequency allowed by DVFS. (For Pentium 4 processors, Intel Xeon processors (family [0FH], models [03H and higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], model [0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors (family [06H], model [0FH]); for Intel Core 2 and Intel Xeon processors (family [06H], DisplayModel [17H]); for Intel Atom processors (family [06H], DisplayModel [1CH]))

We have measured that Xeon Phi cores are handling as they were part of the first family of cores, whereas the regular architectures (mainly Sandybridge and Ivybridge processors) we have tested so far belong to the second family. The documentation also stands that "this is the architectural behavior moving forward". My first question is, then:

Why doesn't Xeon Phi have such a behavior and do you plan to introduce such a behavior in the next iterations of the card?

The sleeping issue:

Trying to understand the TSC behavior, we also came up with another issue involving regular usleep/select functions. As rdtsc, the resulting sleeping time is sensitive to frequency transition (the smaller the frequency, the longer the sleeping time). It means that if the ondemand DVFS policy is on, there is no way of controlling the sleeping time.

uint64_t start, end;
for (unsigned int i = 0; i < 20; i++) {
   clock_gettime (CLOCK_REALTIME, &start);
   usleep(100000);
   clock_gettime (CLOCK_REALTIME, &end);
}

I know the usleep/select manpages indicates that the sleeping time is an approximation and depends on system granularity, but here it is worse as it involves that frequency transition does change sleeping time, and not only system noise. As we are using the relative accuracy of the usleep function in a environment that periodically changes a processor's frequency, we are struggling to find a function that allow use to sleep for a defined time that does not depend on frequency.

Do you know any function call that would allow to sleep for a controlled time, insensitive to frequency transition?

Thank you for your time,

Jp

10 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Presumably the Xeon Phi is plugged into a host motherboard with a processor of the second class. Depending on your timing requirements, the host could provide a consistent time "tick" via shared memory variable.

for(;;) {
sharedVariable = _rdtsc();
_mm_pause();
}

Alternately you could install a PCIe device with a precision clock.

Jim Dempsey

www.quickthreadprogramming.com

Dear Jim,

Thank you for your answer! Your solution is very interesting and could work indeed. However, we are actually planning to save energy with our tool. Therefore, we cannot waste the host processor power to compute "only" a relevant tick. Moreover, our runtime system has been initially designed to be easily installed on a card, and adding hardware would make the process considerably harder.

That being said, I appreciate your help!

Jp

When the default clock source was changed from ETC to TSC, software was added to the MPSS to correct for the effect of power management on the TSC clocksource. I had thought this would compensate for frequency changes. Can you confirm you are using one of these later MPSS releases? 

I think there is a potential problem with Jim's solution, namely that shared variables are not updated except at the beginning and end of an offload section. If you wanted to try a clock on the coprocessor itself that is unaffected by frequency and power states, you could use the ETC. This won't give you the consistency you want for the sleep function, however, because the time required to access ETC varies depending on the location of the core and the contention for the clock.

Dear Frances,

Thank you for your answer ! We have updated our Xeon Phi card to the latest version of MPSS (mpss_gold_update_3-2.1.6720-19   (released:   September 10 2013)) and it does not seem to fix the TSC issue.

The part about Jim's solution is also interesting. I have to add that our program is not offloaded by the way, it runs on the coprocessor as a stand-alone program. I am looking forward to try using the ETC instead, and I will let you know if that fixed our problem if I find a way to use it.

Jp

For your information, I found an interesting blog post pointing out the lack of constant-rate support of the TSC clock on Intel Xeon Phi. Anybody knows whether it will be supported in future iterations of the card ?

http://software.intel.com/en-us/blogs/2013/06/20/eliminate-the-dreaded-c...

By the way, I couldn't find a proper documentation about how to get the ETC counter from software either, so if you have any information related, please let me know! :)

Jp

Francis,

The shared variable is by way of a memory mapped region used on both systems. Though I have not seen the technical spec. on the interface communication capability, I would assume that some portion of bus address space is mapable by both systems.

Jim Dempsey

www.quickthreadprogramming.com

Jp - I don't have any new information but I was wondering how your program was going - did you find a workaround of any kind?

Hello Frances,

I have indeed been working on some workarounds for these issues. Our tool has a small offline phase which computes some metrics related to power consumption. We take this opportunity to compute a TSC ratio for each clock rate so that we can end up with a constant TSC clock when actually measuring tsc values. It's a little tricky and we hope next iterations of the Xeon Phi will provide this natively.

For the sleep call however, we had to deal with it, as it is part of the MPSS software and it is not open source as far as I know. So we try to deal with coarse granularity. If there is any better solution someone can think of, I'd be glad to hear it ! :)

Jp

引文:

Jean-Philippe H. 写道:

For the sleep call however, we had to deal with it, as it is part of the MPSS software and it is not open source as far as I know. 

MPSS sources are available from http://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss (they may not be up to date with the latest release, but will doubtless catch up over time).

Melden Sie sich an, um einen Kommentar zu hinterlassen.