PinPlay* FAQ

Published: 02/11/2015, Last Updated: 02/10/2015

I. How long does record/replay take?

Record/replay overhead is a function of number of memory accesses and the amount of sharing in the test program.

1. Time for recording/replaying a 'region': 

Source : CGO2014 paper on DrDebug

2. Slow-down for whole-program recording.

Source: Measured with PinPlay kit 2.0. (we are continuously looking to improve these)

    Average
Slowdown
x Native
 
Benchmark/Input  How recorded/replayed
(pin -t pinplay-driver.so ...)
Logger

Replayer 

SPEC2006/'ref'  -log:mt 0 / -replay:addr_trans 0 98x 11x
PARSEC/'native' >=4T  -log:mt 1 / -replay:addr_trans 0 197x 37x

II. Why does PinPlay have a high overhead (especially for recording)?

The design goals of PinPlay were:

  • No special HW requirement (including no reliance on HW performance counters).
  • No special operating system requirement (including no virtual machine or no modified kernel).
  • Complete and faithful reproduction of multi-threaded schedules.
  • Portability (small size, OS-independence) of recording ("pinball").
  • No program source needed. No re-compilation/re-linking required.

As a result, PinPlay works on multiple operating systems 'out of the box' and provides the guarantee that a bug once captured will not escape. However, that comes with a high overhead, especially during recording.

There are two major sources of slow-down in PinPlay (we are continuously looking to improve these):

1. System call side-effect analysis.

A shadow memory is implemented during recording. All real memory writes observed in the program are replicated on the shadow memory. Memory reads lead to a comparison of 'real' memory values and 'shadow' memory values and mismatch/missing value leads to an injection being emitted in the *.sel file. At replay time, all memory reads are monitored and recorded memory values are injected if present. The details are described in our SIGMETRICS 2006 paper "Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation".

The overhead of this technique is proportional to the number of memory accesses in the program.

2. Shared memory access order analysis.

During recording, all memory accesses are monitored and a cache coherency protocol is simulated including maintenance of last reader/writer for each shared memory access. A subset of detected read-after-write, write-after-read, and write-after-write dependences is recorded in the *.race file. During replay, all memory accesses are monitored and a thread is delayed if it tries to access a shared memory location out of order.

The overhead of this technique is proportional to the number of shared memory accesses in the program.

Navigate to: PinPlay  | DrDebug 

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804