Low overhead counters on Linux?

Low overhead counters on Linux?

Does anyone know of any timers within Linux that have low overhead? That is, low enough to be able to use within individual Pthreads that would be used many timesand not drastically affect overall execution time.

--clay

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
ClayB wrote:

Does anyone know of any timers within Linux that have low overhead? That is, low enough to be able to use within individual Pthreads that would be used many timesand not drastically affect overall execution time.

--clay

Counters or timers? For the latter, you're probably limited to gettimeofday or clock_gettime. There's a bit of overhead to make the TSC, which can vary in frequency from moment to moment and from cpu to cpu, into something useful as a time base. If you're on one cpu and just want to count cycles, you can read the TSC directly.

Usually for timing things like instruction sequences, I just execute them in a loop about a million times and divide the overall time by the loop count.

For counters, you'd have to give more information like do you mean shared counters. I know Terje Mathisen was working on a way to atomically read a 64 bit counter on IA-32, the tod clock being a 64 bit counter. I have a slight modification to that to allow it to work for multiple writers and also get rid of the need for a SFENCE in the unlikely event that they were needed in IA-32. I also did a 63 bit atomic counter for IA-32. Code for that and some explanation of the 64 bit technique has been recently posted to comp.lang.asm.x86.

Got this reply in my email. (Posted with permission of the author.)

Dear Clay,

I had some problems with my Web browser and couldn't write my answer to the forum, therefore I am sending an e-mail directly to you.

I didn't understand whether you are looking for a standard mechanism, already included in the Linux kernel. I am really not aware of something like that. If there is no problem to patch your Linux kernel, then there is another solution. As far as I know, the most lightweight mechanism to time something is the Time Stamp Counter (TSC), available on Pentium and newer processors. It is just one assembly instruction and you get nano-second resolution.

But there is of course a catch. The TSC doesn't know anything about threads, as it just counts cycles since the the machine booted up. Fortunately, there is a solution to this problem. You could have a look at the perfctr library by Michael Pettersson, which is available here:

http://user.it.uu.se/~mikpe/linux/perfctr/

(The 2.6 branch is the latest stable one and 2.6.11 is the latest version)

This library contains a patch for the Linux kernel, which creates a "virtual" TSC per kernel-thread. Actually, it just saves and restores the TSC when a kernel-thread performs a context-switch. It also contains a user-level library to read the TSC. To my knowledge, the overhead of the patch and the library is really low.

If you need some high-level interface, you could use the PAPI library (available from http://icl.cs.utk.edu/papi, latest version is 3.0.7), which uses internally the perfctr library.

I hope that you found something useful in this message :-)

Best regards,

Ioannis E. Venetis

Leave a Comment

Please sign in to add a comment. Not a member? Join today