Timing functions for threaded programs

Problem :  Threaded program does not appear to scale when timed using the Fortran standard intrinsic CPU_TIME.

Environment : Windows, Linux, Mac OS X

Root Cause : The CPU_TIME intrinsic subroutine returns the cpu time summed over all active threads. Thus the time returned for an application is run in multithreaded mode may be as large or larger than when the same application is run as a single thread, even when the elapsed wall clock time is much less.

Resolution : Use the Fortran standard intrinsic subroutine SYSTEM_CLOCK instead; alternatively, use the Intel portability function DCLOCK. Both of these return the elapsed time from a single clock, and may be used to estimate performance or scaling of multithreaded applications.

For more complete information about compiler optimizations, see our Optimization Notice.