Pin
Pin 4.2 User Guide



Introduction


Pin is a tool for the instrumentation of programs. It supports the Linux(R) and Microsoft Windows operating systems and executables for the IA-32 and x86-64 instruction-set architectures.

Pin allows a tool to insert arbitrary code (written in C or C++) in arbitrary places in the executable. The code is added dynamically while the executable is running. This also makes it possible to attach Pin to an already running process.

Pin provides a rich API that abstracts away the underlying instruction set idiosyncrasies and allows context information such as register contents to be passed to the injected code as parameters. Pin automatically saves and restores the registers that are overwritten by the injected code so the application continues to work. Limited access to symbol and debug information is available as well.

Pin includes the source code for a large number of example instrumentation tools like basic block profilers, cache simulators, instruction trace generators, etc. It is easy to derive new tools using the examples as a template.

Tutorial Sections

Reference Sections



What's New in Pin 4.x


Pin 4.x is mostly backward compatible with Pin 3.x. For a list of breaking changes please refer to the README.

Starting with Pin 4.0 the following new capabilities are available to Pintool writers:


How to Instrument with Pin


Table of Contents

Pin

The best way to think about Pin is as a "just in time" (JIT) compiler. The input to this compiler is not bytecode, however, but a regular executable. Pin intercepts the execution of the first instruction of the executable and generates ("compiles") new code for the straight line code sequence starting at this instruction. It then transfers control to the generated sequence. The generated code sequence is almost identical to the original one, but Pin ensures that it regains control when a branch exits the sequence. After regaining control, Pin generates more code for the branch target and continues execution. Pin makes this efficient by keeping all of the generated code in memory so it can be reused and directly branching from one sequence to another.

In JIT mode, the only code ever executed is the generated code. The original code is only used for reference. When generating code, Pin gives the user an opportunity to inject their own code (instrumentation).

Pin instruments all instructions that are actually executed. It does not matter in what section they reside. Although there are some exceptions for conditional branches, generally speaking, if an instruction is never executed then it will not be instrumented.

Pintools

Conceptually, instrumentation consists of two components:

  • A mechanism that decides where and what code is inserted
  • The code to execute at insertion points

These two components are instrumentation and analysis code. Both components live in a single executable, a Pintool. Pintools can be thought of as plugins that can modify the code generation process inside Pin.

The Pintool registers instrumentation callback routines with Pin that are called from Pin whenever new code needs to be generated. This instrumentation callback routine represents the instrumentation component. It inspects the code to be generated, investigates its static properties, and decides if and where to inject calls to analysis functions.

The analysis function gathers data about the application. Pin makes sure that the integer and floating point register state is saved and restored as necessary and allow arguments to be passed to the functions.

The Pintool can also register notification callback routines for events such as thread creation or forking. These callbacks are generally used to gather data or tool initialization or clean up.

Observations

Since a Pintool works like a plugin, it must run in the same address space as Pin and the executable to be instrumented. Hence the Pintool has access to all of the executable's data. It also shares file descriptors and other process information with the executable.

Pin and the Pintool control a program starting with the very first instruction. For executables compiled with shared libraries this implies that the execution of the dynamic loader and all shared libraries will be visible to the Pintool.

When writing tools, it is more important to tune the analysis code than the instrumentation code. This is because the instrumentation is executed once, but analysis code is called many times.

Instrumentation Granularity

As described above, Pin's instrumentation is "just in time" (JIT). Instrumentation occurs immediately before a code sequence is executed for the first time. We call this mode of operation trace instrumentation .

Trace instrumentation lets the Pintool inspect and instrument an executable one trace at a time. Traces usually begin at the target of a taken branch and end with an unconditional branch, including calls and returns. Pin guarantees that a trace is only entered at the top, but it may contain multiple exits. If a branch joins the middle of a trace, Pin constructs a new trace that begins with the branch target. Pin breaks the trace into basic blocks, BBLs. A BBL is a single entrance, single exit sequence of instructions. Branches to the middle of a bbl begin a new trace and hence a new BBL. It is often possible to insert a single analysis call for a BBL, instead of one analysis call for every instruction. Reducing the number of analysis calls makes instrumentation more efficient. Trace instrumentation utilizes the TRACE_AddInstrumentFunction API call.

Note, though, that since Pin is discovering the control flow of the program dynamically as it executes, Pin's BBL can be different from the classical definition of a BBL which you will find in a compiler textbook. For instance, consider the code generated for the body of a switch statement like this

switch(i)
{
case 4: total++;
case 3: total++;
case 2: total++;
case 1: total++;
case 0:
default: break;
}

It will generate instructions something like this (for the IA-32 architecture)

.L7:
addl $1, -4(%ebp)
.L6:
addl $1, -4(%ebp)
.L5:
addl $1, -4(%ebp)
.L4:
addl $1, -4(%ebp)

In terms of classical basic blocks, each addl instruction is in a single instruction basic block. However as the different switch cases are executed, Pin will generate BBLs which contain all four instructions (when the .L7 case is entered), three instructions (when the .L6 case is entered), and so on. This means that counting Pin BBLs is unlikely to give the count you would expect if you thought that Pin BBLs were the same as the basic blocks in the text book. Here, for instance, if the code branches to .L7 you will count one Pin BBL, but there are four classical basic blocks executed.

Pin also breaks BBLs on some other instructions which may be unexpected, for instance cpuid, popf and REP prefixed instructions all end traces and therefore BBLs. Since REP prefixed instructions are treated as implicit loops, if a REP prefixed instruction iterates more than once, iterations after the first will cause a single instruction BBL to be generated, so in this case you would see more basic blocks executed than you might expect.

As a convenience for Pintool writers, Pin also offers an instruction instrumentation mode which lets the tool inspect and instrument an executable a single instruction at a time. This is essentially identical to trace instrumentation where the Pintool writer has been freed from the responsibilty of iterating over the instructions inside a trace. As described under trace instrumentation, certain BBLs and the instructions inside of them may be generated (and hence instrumented) multiple times. Instruction instrumentation utilizes the INS_AddInstrumentFunction API call.

Sometimes, however, it can be useful to look at different granularity than a trace. For this purpose Pin offers two additional modes: image and routine instrumentation. These modes are implemented by "caching" instrumentation requests and hence incur a space overhead, these modes are also referred to as ahead-of-time instrumentation.

Image instrumentation lets the Pintool inspect and instrument an entire image, IMG: Image Object, when it is first loaded. A Pintool can walk the sections, SEC: Section Object, of the image, the routines, RTN: Routine Object, of a section, and the instructions, INS of a routine. Instrumentation can be inserted so that it is executed before or after a routine is executed, or before or after an instruction is executed. Image instrumentation utilizes the IMG_AddInstrumentFunction API call. Image instrumentation depends on symbol information to determine routine boundaries hence PIN_InitSymbols must be called before PIN_Init.

Routine instrumentation lets the Pintool inspect and instrument an entire routine when the image it is contained in is first loaded. A Pintool can walk the instructions of a routine. There is not enough information available to break the instructions into BBLs. Instrumentation can be inserted so that it is executed before or after a routine is executed, or before or after an instruction is executed. Routine instrumentation is provided as a convenience for Pintool writers, as an alternative to walking the sections and routines of the image during the Image instrumentation, as described in the previous paragraph.

Routine instrumentation utilizes the RTN_AddInstrumentFunction API call. Instrumentation of routine exits does not work reliably in the presence of tail calls or when return instructions cannot reliably be detected.

Note that in both Image and Routine instrumentation, it is not possible to know whether or not a routine will actually be executed (since these instrumentations are done at image load time). It is possible to walk the instructions only of routines that are executed, in the Trace or Instruction instrumentation routines, by identifying instructions that are the start of routines. See the tool Tests/parse_executed_rtns.cpp.

Managed platforms support

Pin supports all executables including the managed binaries. From Pin point of view managed binary is one more kind of a self-modifying program. There is a way to cause Pin to differentiate the just-in-time compiled code (Jitted code) from all other dynamically generated code and associate Jitted code with appropriate managed functions. To get this functionality, the just-in-time compiler (Jitter) of the running managed platform should support Jit Profiling API

The following capabilities are supported:

Following conditions must be satisfied to get the managed platforms support:

  • Set LD_LIBRARY_PATH environment variables to include pinjitprofiling dynamic library and Pin CRT libraries location.

  • Add the knob support_jit_api to the Pin command line as Pintool option:

    <Pin executable> <Pin options> -t <Pintool> -support_jit_api <Other Pintool options> -- <Test application> <Test application options>

Symbols

Pin provides access to function names using the symbol object (SYM). Symbol objects only provide information about the function symbols in the application. Information about other types of symbols (e.g. data symbols), must be obtained independently by the tool.

On Windows, you can use dbghelp.dll for this.
Note that using dbghelp.dll in an instrumented process is not safe and can cause dead-locks in some cases. A possible solution is to find symbols using a different non-instrumented process.

On Linux, you can use libdwarf.so that is provided as part of the Pin kit to access DWARF information.

libdwarf

The libdwarf.so library in the Pin kit is based on the open source libdwarf project (https://github.com/davea42/libdwarf-code) and is linked with Pin CRT.
The libdwarf header files are located at ./extras/libdwarf/libdwarf-2.3.1/src/lib/libdwarf under the Pin root directory.
The libdwarf.so libraries are located together with the other Pin libraries at intel64/lib/ and ia32/lib/.
To use the library, add the libdwarf include directory to the pintool include path, and link with libdwarf.so (add -ldwarf to the link command).
The full documentation of the libdwarf API can be found in the open source libdwarf project page https://www.prevanders.net/libdwarfdoc/index.html
The repository includes examples for how to use the API, for example the dwarfdump application and several examples under dwarfexample.
The Pin kit includes one pintool that uses the libdwarf library - DebugInfo/libdwarf_client.cpp
The Pin kit includes, in addition to the libdwarf.so library, the sources that were used to build it.
The sources are provided at ./extras/libdwarf under the Pin root directory.
The README file includes instructions on how to build the library from those sources.

PIN_InitSymbols must be called to access functions by name. See Symbols for more information.

Floating Point Support in Analysis Routines

Pin takes care of maintaining the application's floating point state accross analysis routines.

IARG_REG_VALUE cannot be used to pass floating point register values as arguments to analysis routines.

Instrumenting Multi-threaded Applications

Instrumenting a multi-threaded program requires that the tool be thread safe - access to global storage must be coordinated with other threads. Starting with Pin 4.0, Pin provides pthread and C++11 threads support. Pintool writers are free to select any of the standard mechanisms to create and synchronize threads. Pin also provides its own locking and thread management API's, which the Pintool can use (See LOCK: Locking Primitives and Pin Thread API.). When selecting a synchronization mechanism, the Pintool writer should consider the performance implications of the mechanism. As well as how it intends to handle locks across forks. Pin Lock APIs, LOCK: Locking Primitives, are fork aware and will automatically reinitialize locks in the child process after fork. If the Pintool writer intends to use a different synchronization mechanism, it should take care to reinitialize locks in the child process after fork - to that end Pin provides the PIN_AddForkFunction and PIN_AddForkFunctionProbed APIs to register a callback that is called in the child process after fork (use FPOINT_AFTER_IN_CHILD ). Another option would be to use the pthread_atfork API to register a callback to be called in the child process after fork.

Note
Please note that this method should not be used in Probe mode since we can't guarantee that the pthread_atfork handlers will be called.

Pintools do not need to add explicit locking to instrumentation routines because Pin calls these routines while holding an internal lock called the VM lock. However, Pin does execute analysis and replacement functions in parallel in the context of the native application thread, so Pintools may need to add locking to these routines if they access global data.

Pin provides callbacks when each thread starts and ends (see PIN_AddThreadStartFunction and PIN_AddThreadFiniFunction). These provide a convenient place for a Pintool to allocate and manipulate thread local data and store it on a thread's local storage.

Understanding Thread Ids in Pin

Pin aims to isolate itself and Pintools from the native application so to minimize unintentional modifications to program behavior. To that end Pin maintains an O/S abstraction layer dubbed PINOS. All thread management functions go through PINOS.

PINOS maintains its own thread ID that is mapped to a native thread ID. The PINOS thread ID is returned when calling gettid or PIN_GetTid. The native O/S thread ID can be retrieved by calling PIN_GetNativeTid, PIN_GetNativeTidFromSysTid or PIN_GetNativeTidFromThreadId.

Pin also provides an internal thread ID that can be used to identify a thread and is available as an analysis routine argument (IARG_THREAD_ID) or by using PIN_ThreadId(). This ID is different from both the O/S system thread ID PINOS thread ID. It is a number starting at 0, which can be used as an index to an array of thread data or as the locking value to Pin user locks. See the example Instrumenting Threaded Applications for more information. Using Pin's internal thread ID is the most efficient way to identify a thread for Thread Local Storage (TLS) or thread specific storage access. Pin supports a theoretical limit of 64K - 1 concurrent threads in a process. The thread ID provided by Pin will monotonically increase until it reaches 64K - 1 and then wrap to 0. Pin knows how to skip reused IDs for active threads. However, if the thread ID is used as an index to some data structure holding thread information then this ID can only be considered unique if there are less than 64K threads created throughout the lifetime of the instrumented application. PINOS thread IDs are also recycled using the same logic as Pin's thread IDs. Native O/S thread IDs are recycled according to the specific O/S logic. For both Linux & Windows it may be safe to assume that native O/S thread IDs are unique throughout the lifetime of a process.

Thread Local Storage

Pin provides an efficient thread local storage (TLS), with the option to allocate a new TLS key and associate it with a given data destruction function. Any thread of the process can store and retrieve values in its own slot, referenced by the allocated key. The initial value associated with the key in all threads is NULL. This TLS is indexed by both the TLS key and Pin's internal thread ID for fast access. See PIN_CreateThreadDataKey, PIN_DeleteThreadDataKey, PIN_SetThreadData, PIN_GetThreadData.

It is possible to access Pin's fast TLS implementation from an application thread and also from internal threads created using PIN_SpawnInternalThread without any initialization. To access the TLS from threads created using pthread_create or std::thread, the thread procedure should call PIN_InitializeInternalThread before trying to access Pin's TLS. To get the thread ID from an internal thread Pin provides PIN_ThreadId. An analysis function running in an application thread can just use IARG_THREAD_ID to efficiently get Pin's thread ID. See the example Using TLS for more information.

Starting with Pin 4.0, Pin fully supports pthread TLS. pthread TLS is slower than Pin's internal TLS. However, pthread TLS can be used from threads created using pthread_create and std::thread without first calling PIN_InitializeInternalThread.

Note
In Probe mode Pin's fast TLS cannot be used from application threads. It can still be used from internal threads.

False sharing occurs when multiple threads access different parts of the same cache line and at least one of them is a write. To maintain memory coherency, the computer must copy the memory from one CPU's cache to another, even though data is not truly shared. False sharing can usually be avoided by padding critical data structures to the size of a cache line, or by rearranging the data layout of structures. See the example Using TLS for more information.

Note
Pin does not support the usage of neither the thread_local C++11 keyword nor any other compiler specific TLS extensions.

Avoiding Deadlocks in Multi-threaded Applications

Since Pin, the tool, and the application may each acquire and release locks, Pintool developers must take care to avoid deadlocks with either the application or Pin. Deadlocks generally occur when two threads acquire the same locks in a different order. For example, thread A acquires lock L1 and then acquires lock L2, while thread B acquires lock L2 and then acquires lock L1. This will lead to a deadlock if thread A holds lock L1 and waits for L2 while thread B holds lock L2 and waits for L1. To avoid such deadlocks, Pin imposes a hierarchy on the order in which locks must be acquired. Pin generally acquires its own internal locks before the tool acquires any lock (e.g. via PIN_GetLock() or standard locking APIs). Additionally, we assume that the application may acquire locks at the top of this hierarchy (i.e. before Pin acquires its internal locks). The following diagram illustrates the hierarchy:

Application locks -> Pin internal locks -> Tool locks

Pintool developers should design their Pintools such that they never break this lock hierarchy, and they can do so by following these basic guidelines:

  • If the tool acquires any locks from within a Pin callback, it must release those locks before returning from that callback. Holding a lock across Pin callbacks violates the hierarchy with respect to the Pin internal locks.
  • If the tool acquires any locks from within an analysis routine, it must release those locks before returning from the analysis routine. Holding a lock across Pin analysis routines violates the hierarchy with respect to Pin internal locks and other locks used by the instrumented application itself.
  • If the tool calls a Pin API from within a Pin callback or analysis routine, it should not hold any tool locks when calling the API. Some of the Pin APIs use the internal Pin locks so holding a tool lock before invoking these APIs violates the hierarchy with respect to the Pin internal locks.
  • If the tool calls a Pin API from within an analysis routine, it may need to acquire the Pin client lock first by calling PIN_LockClient(). This depends on the API, so check the documentation for the specific API for more information. Note that the tool should not hold any other locks when calling PIN_LockClient(), as described in the previous item.

While these guidelines are sufficient in most cases, they may turn out to be too restrictive for certain use-cases. The next set of guidelines explains the conditions in which it is safe to relax the basic guidelines above:

  • In JIT mode, the tool may acquire locks from within an analysis routine and not release them, providing it releases these locks before leaving the trace that contains the analysis routine. The tool must expect that the trace may exit "early" if an application instruction raises an exception. Any lock L, which the tool might hold when the application raises an exception, must obey the following sub-rules:
    • The tool must establish a callback that executes when the application raises an exception and this callback must release lock L if it was acquired at the time the exception occurred. Tools can use PIN_AddContextChangeFunction() to establish this callback.
    • The tool must not acquire lock L from within any Pin callback, to avoid violating the hierarchy with respect to the Pin internal locks.
  • If the tool calls a Pin API from an analysis routine, it may acquire and hold a lock L while calling the API providing that:
    • Lock L is not being acquired from any Pin callback. This avoids the hierarchy violation with respect to the Pin internal locks.
    • The Pin API being invoked does not cause application code to execute (e.g., PIN_CallApplicationFunction()). This avoids the hierarchy violation with respect to the locks used by the application itself.



Examples


Table of Contents

To illustrate how to write Pintools, we present some simple examples. In the web based version of the manual, you can click on a function in the Pin API to see its documentation.

All the examples presented in the manual can be found in the source/tools/ManualExamples directory.

Building the Example Tools

To build all examples in a directory for ia32 architecture:

$ cd source/tools/ManualExamples
$ make all TARGET=ia32

To build all examples in a directory for intel64 architecture:

$ cd source/tools/ManualExamples
$ make all TARGET=intel64

To build and run a specific example (e.g., inscount0):

$ cd source/tools/ManualExamples
$ make inscount0.test TARGET=intel64

To build a specific example without running it (e.g., inscount0):

$ cd source/tools/ManualExamples
$ make obj-intel64/inscount0.so TARGET=intel64

The above applies to the Intel(R) 64 architecture. For the IA-32 architecture, use TARGET=ia32 instead.

$ cd source/tools/ManualExamples
$ make obj-ia32/inscount0.so TARGET=ia32

Notes for Building Tools for Windows

Since the tools are built using make, be sure to install Cygwin with make or a Mingw based environment with make first (Pin provides Cygwin & Mingw compatible make files). For more information see Building Your Own Tool.

Open the Visual Studio Command Prompt corresponding to your target architecture, i.e. x86 or x64, and follow the steps in the Building the Example Tools section.

Simple Instruction Count (Instruction Instrumentation)

The example below instruments a program to count the total number of instructions executed. It inserts a call to docount before every instruction. When the program exits, it saves the count in the file inscount.out.

Here is how to run it and display its output (note that the file list is the ls output, so it may be different on your machine, similarly the instruction count will depend on the implementation of ls):

$ ../../../pin -t obj-intel64/inscount0.so -- /bin/ls
Makefile          atrace.o     imageload.out  itrace      proccount
Makefile.example  imageload    inscount0      itrace.o    proccount.o
atrace            imageload.o  inscount0.o    itrace.out
$ cat inscount.out
Count 422838
$

The KNOB exhibited in the example below overwrites the default name for the output file. To use this feature, add "-o <file_name>" to the command line. Tool command line options should be inserted between the tool name and the double dash ("--"). For more information on how to add command line options to your tool, please see KNOB: Commandline Option Handling.

$ ../../../pin -t obj-intel64/inscount0.so -o inscount0.log -- /bin/ls

The example can be found in source/tools/ManualExamples/inscount0.cpp

/*
* Copyright (C) 2004-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <iostream>
#include <fstream>
#include "pin.H"
std::ofstream OutFile;
// The running count of instructions is kept here
// make it static to help the compiler optimize docount
static UINT64 icount = 0;
// This function is called before every instruction is executed
VOID docount() { icount++; }
// Pin calls this function every time a new instruction is encountered
VOID Instruction(INS ins, VOID* v)
{
// Insert a call to docount before every instruction, no arguments are passed
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END);
}
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "inscount.out", "specify output file name");
// This function is called when the application exits
VOID Fini(INT32 code, VOID* v)
{
// Write to a file since std::cout and std::cerr maybe closed by the application
OutFile.setf(std::ios::showbase);
OutFile << "Count " << icount << std::endl;
OutFile.close();
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool counts the number of dynamic instructions executed" << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
/* argc, argv are the entire command line: pin -t <toolname> -- ... */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
OutFile.open(KnobOutputFile.Value().c_str());
// Register Instruction to be called to instrument instructions
INS_AddInstrumentFunction(Instruction, 0);
// Register Fini to be called when the application exits
// Start the program, never returns
return 0;
}
Definition: knob.PH:371
@ IPOINT_BEFORE
Insert a call before the first instruction of the instrumented object. Always valid.
Definition: types_vmapi.PH:135
PIN_CALLBACK INS_AddInstrumentFunction(INS_INSTRUMENT_CALLBACK fun, VOID *val)
VOID INS_InsertCall(INS ins, IPOINT action, AFUNPTR funptr,...)
STATIC std::string StringKnobSummary()
@ KNOB_MODE_WRITEONCE
single value, single write
Definition: knob.PH:21
PIN_CALLBACK PIN_AddFiniFunction(FINI_CALLBACK fun, VOID *val)
VOID PIN_StartProgram(PIN_CONFIGURATION_INFO options=PIN_CreateDefaultConfigurationInfo())
BOOL PIN_Init(INT32 argc, CHAR **argv)

Instruction Address Trace (Instruction Instrumentation)

In the previous example, we did not pass any arguments to docount, the analysis procedure. In this example, we show how to pass arguments. When calling an analysis procedure, Pin allows you to pass the instruction pointer, current value of registers, effective address of memory operations, constants, etc. For a complete list, see IARG_TYPE.

With a small change, we can turn the instruction counting example into a Pintool that prints the address of every instruction that is executed. This tool is useful for understanding the control flow of a program for debugging, or in processor design when simulating an instruction cache.

We change the arguments to INS_InsertCall to pass the address of the instruction about to be executed. We replace docount with printip, which prints the instruction address. It writes its output to the file itrace.out.

This is how to run it and look at the output:

$ ../../../pin -t obj-intel64/itrace.so -- /bin/ls
Makefile          atrace.o     imageload.out  itrace      proccount
Makefile.example  imageload    inscount0      itrace.o    proccount.o
atrace            imageload.o  inscount0.o    itrace.out
$ head itrace.out
0x40001e90
0x40001e91
0x40001ee4
0x40001ee5
0x40001ee7
0x40001ee8
0x40001ee9
0x40001eea
0x40001ef0
0x40001ee0
$

The example can be found in source/tools/ManualExamples/itrace.cpp

/*
* Copyright (C) 2004-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <stdio.h>
#include "pin.H"
FILE* trace;
// This function is called before every instruction is executed
// and prints the IP
VOID printip(VOID* ip) { fprintf(trace, "%p\n", ip); }
// Pin calls this function every time a new instruction is encountered
VOID Instruction(INS ins, VOID* v)
{
// Insert a call to printip before every instruction, and pass it the IP
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END);
}
// This function is called when the application exits
VOID Fini(INT32 code, VOID* v)
{
fprintf(trace, "#eof\n");
fclose(trace);
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
PIN_ERROR("This Pintool prints the IPs of every instruction executed\n" + KNOB_BASE::StringKnobSummary() + "\n");
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
trace = fopen("itrace.out", "w");
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
// Register Instruction to be called to instrument instructions
INS_AddInstrumentFunction(Instruction, 0);
// Register Fini to be called when the application exits
// Start the program, never returns
return 0;
}
@ IARG_INST_PTR
Definition: types_vmapi.PH:225

Memory Reference Trace (Instruction Instrumentation)

The previous example instruments all instructions. Sometimes a tool may only want to instrument a class of instructions, like memory operations or branch instructions. A tool can do this by using the Pin API which includes functions that classify and examine instructions. The basic API is common to all instruction sets and is described here. In addition, there is an instruction set specific API for the IA-32 ISA.

In this example, we show how to do more selective instrumentation by examining the instructions. This tool generates a trace of all memory addresses referenced by a program. This is also useful for debugging and for simulating a data cache in a processor.

We only instrument instructions that read or write memory. We also use INS_InsertPredicatedCall instead of INS_InsertCall to avoid generating references to instructions that are predicated when the predicate is false. On IA-32 and Intel(R) 64 architectures CMOVcc, FCMOVcc and REP prefixed string operations are treated as being predicated. For CMOVcc and FCMOVcc the predicate is the condition test implied by "cc", for REP prefixed string ops it is that the count register is non-zero.

Since the instrumentation functions are only called once and the analysis functions are called every time an instruction is executed, it is much faster to instrument only the memory operations, as compared to the previous instruction trace example that instruments every instruction.

Here is how to run it and the sample output:

$ ../../../pin -t obj-intel64/pinatrace.so -- /bin/ls
Makefile          atrace.o    imageload.o    inscount0.o  itrace.out
Makefile.example  atrace.out  imageload.out  itrace       proccount
atrace            imageload   inscount0      itrace.o     proccount.o
$ head pinatrace.out
0x40001ee0: R 0xbfffe798
0x40001efd: W 0xbfffe7d4
0x40001f09: W 0xbfffe7d8
0x40001f20: W 0xbfffe864
0x40001f20: W 0xbfffe868
0x40001f20: W 0xbfffe86c
0x40001f20: W 0xbfffe870
0x40001f20: W 0xbfffe874
0x40001f20: W 0xbfffe878
0x40001f20: W 0xbfffe87c
$

The example can be found in source/tools/ManualExamples/pinatrace.cpp

/*
* Copyright (C) 2004-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
/*
* This file contains an ISA-portable PIN tool for tracing memory accesses.
*/
#include <stdio.h>
#include "pin.H"
FILE* trace;
// Print a memory read record
VOID RecordMemRead(VOID* ip, VOID* addr) { fprintf(trace, "%p: R %p\n", ip, addr); }
// Print a memory write record
VOID RecordMemWrite(VOID* ip, VOID* addr) { fprintf(trace, "%p: W %p\n", ip, addr); }
// Is called for every instruction and instruments reads and writes
VOID Instruction(INS ins, VOID* v)
{
// Instruments memory accesses using a predicated call, i.e.
// the instrumentation is called iff the instruction will actually be executed.
//
// On the IA-32 and Intel(R) 64 architectures conditional moves and REP
// prefixed instructions appear as predicated instructions in Pin.
UINT32 memOperands = INS_MemoryOperandCount(ins);
// Iterate over each memory operand of the instruction.
for (UINT32 memOp = 0; memOp < memOperands; memOp++)
{
if (INS_MemoryOperandIsRead(ins, memOp))
{
IARG_END);
}
// Note that in some architectures a single memory operand can be
// both read and written (for instance incl (%eax) on IA-32)
// In that case we instrument it once for read and once for write.
if (INS_MemoryOperandIsWritten(ins, memOp))
{
INS_InsertPredicatedCall(ins, IPOINT_BEFORE, (AFUNPTR)RecordMemWrite, IARG_INST_PTR, IARG_MEMORYOP_EA, memOp,
IARG_END);
}
}
}
VOID Fini(INT32 code, VOID* v)
{
fprintf(trace, "#eof\n");
fclose(trace);
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
PIN_ERROR("This Pintool prints a trace of memory addresses\n" + KNOB_BASE::StringKnobSummary() + "\n");
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
if (PIN_Init(argc, argv)) return Usage();
trace = fopen("pinatrace.out", "w");
INS_AddInstrumentFunction(Instruction, 0);
// Never returns
return 0;
}
@ IARG_MEMORYOP_EA
Type: ADDRINT. Effective address of a memory op (memory op index is next arg); only valid at IPOINT_B...
Definition: types_vmapi.PH:497
BOOL INS_MemoryOperandIsRead(INS ins, UINT32 memopIdx)
UINT32 INS_MemoryOperandCount(INS ins)
BOOL INS_MemoryOperandIsWritten(INS ins, UINT32 memopIdx)
VOID INS_InsertPredicatedCall(INS ins, IPOINT ipoint, AFUNPTR funptr,...)

Detecting the Loading and Unloading of Images (Image Instrumentation)

The example below prints a message to a trace file every time and image is loaded or unloaded. It really abuses the image instrumentation mode as the Pintool neither inspects the image nor adds instrumentation code.

If you invoke it on ls, you would see this output:

$ ../../../pin -t obj-intel64/imageload.so -- /bin/ls
Makefile          atrace.o    imageload.o    inscount0.o  proccount
Makefile.example  atrace.out  imageload.out  itrace       proccount.o
atrace            imageload   inscount0      itrace.o     trace.out
$ cat imageload.out
Loading /bin/ls
Loading /lib/ld-linux.so.2
Loading /lib/libtermcap.so.2
Loading /lib/i686/libc.so.6
Unloading /bin/ls
Unloading /lib/ld-linux.so.2
Unloading /lib/libtermcap.so.2
Unloading /lib/i686/libc.so.6
$

The example can be found in source/tools/ManualExamples/imageload.cpp

/*
* Copyright (C) 2004-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
//
// This tool prints a trace of image load and unload events
//
#include "pin.H"
#include <iostream>
#include <fstream>
#include <stdlib.h>
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "imageload.out", "specify file name");
std::ofstream TraceFile;
// Pin calls this function every time a new img is loaded
// It can instrument the image, but this example does not
// Note that imgs (including shared libraries) are loaded lazily
VOID ImageLoad(IMG img, VOID* v) { TraceFile << "Loading " << IMG_Name(img) << ", Image id = " << IMG_Id(img) << std::endl; }
// Pin calls this function every time a new img is unloaded
// You can't instrument an image that is about to be unloaded
VOID ImageUnload(IMG img, VOID* v) { TraceFile << "Unloading " << IMG_Name(img) << std::endl; }
// This function is called when the application exits
// It closes the output file.
VOID Fini(INT32 code, VOID* v)
{
if (TraceFile.is_open())
{
TraceFile.close();
}
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
PIN_ERROR("This tool prints a log of image load and unload events\n" + KNOB_BASE::StringKnobSummary() + "\n");
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Initialize symbol processing
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
TraceFile.open(KnobOutputFile.Value().c_str());
// Register ImageLoad to be called when an image is loaded
// Register ImageUnload to be called when an image is unloaded
IMG_AddUnloadFunction(ImageUnload, 0);
// Register Fini to be called when the application exits
// Start the program, never returns
return 0;
}
PIN_CALLBACK IMG_AddUnloadFunction(IMAGECALLBACK fun, VOID *v)
Registers a callback to be used when an image is unloaded.
PIN_CALLBACK IMG_AddInstrumentFunction(IMAGECALLBACK fun, VOID *v)
Registers a callback to catch the loading of an image.
UINT32 IMG_Id(IMG img)
Returns a unique ID for the image.
const std::string & IMG_Name(IMG img)
Returns the fully qualified actual file name of the image.
VOID PIN_InitSymbols()

More Efficient Instruction Counting (Trace Instrumentation)

The example Simple Instruction Count (Instruction Instrumentation) computed the number of executed instructions by inserting a call before every instruction. In this example, we make it more efficient by counting the number of instructions in a BBL: Single entrance, single exit sequence of instructions at instrumentation time, and incrementing the counter once per BBL: Single entrance, single exit sequence of instructions, instead of once per instruction.

The example can be found in source/tools/ManualExamples/inscount1.cpp

/*
* Copyright (C) 2004-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <iostream>
#include <fstream>
#include "pin.H"
std::ofstream OutFile;
// The running count of instructions is kept here
// make it static to help the compiler optimize docount
static UINT64 icount = 0;
// This function is called before every block
VOID docount(UINT32 c) { icount += c; }
// Pin calls this function every time a new basic block is encountered
// It inserts a call to docount
VOID Trace(TRACE trace, VOID* v)
{
// Visit every basic block in the trace
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
{
// Insert a call to docount before every bbl, passing the number of instructions
BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT32, BBL_NumIns(bbl), IARG_END);
}
}
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "inscount.out", "specify output file name");
// This function is called when the application exits
VOID Fini(INT32 code, VOID* v)
{
// Write to a file since std::cout and std::cerr maybe closed by the application
OutFile.setf(std::ios::showbase);
OutFile << "Count " << icount << std::endl;
OutFile.close();
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool counts the number of dynamic instructions executed" << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
OutFile.open(KnobOutputFile.Value().c_str());
// Register Instruction to be called to instrument instructions
// Register Fini to be called when the application exits
// Start the program, never returns
return 0;
}
VOID BBL_InsertCall(BBL bbl, IPOINT action, AFUNPTR funptr,...)
BBL BBL_Next(BBL bbl)
Returns the next BBL or BBL_INVALID() if this is the end of trace or rtn.
UINT32 BBL_NumIns(BBL bbl)
BOOL BBL_Valid(BBL bbl)
Checks if the BBL is valid.
@ IARG_UINT32
Type: UINT32. Constant (additional integer arg required)
Definition: types_vmapi.PH:218
BBL TRACE_BblHead(TRACE trace)
PIN_CALLBACK TRACE_AddInstrumentFunction(TRACE_INSTRUMENT_CALLBACK fun, VOID *val)
TRACE_CLASS * TRACE
Definition: pin_client.PH:43

Procedure Instruction Count (Routine Instrumentation)

The example below instruments a program to count the number of times a procedure is called, and the total number of instructions executed in each procedure. When it finishes, it prints a profile to proccount.out

Executing the tool and sample output:

$ ../../../pin -t obj-intel64/proccount.so -- /bin/grep proccount.cpp Makefile
proccount_SOURCES = proccount.cpp
$ head proccount.out
              Procedure           Image            Address        Calls Instructions
                  _fini       libc.so.6         0x40144d00            1           21
__deregister_frame_info       libc.so.6         0x40143f60            2           70
  __register_frame_info       libc.so.6         0x40143df0            2           62
              fde_merge       libc.so.6         0x40143870            0            8
            __init_misc       libc.so.6         0x40115824            1           85
            __getclktck       libc.so.6         0x401157f4            0            2
                 munmap       libc.so.6         0x40112ca0            1            9
                   mmap       libc.so.6         0x40112bb0            1           23
            getpagesize       libc.so.6         0x4010f934            2           26
$

The example can be found in source/tools/ManualExamples/proccount.cpp

/*
* Copyright (C) 2004-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
//
// This tool counts the number of times a routine is executed and
// the number of instructions executed in a routine
//
#include <fstream>
#include <iomanip>
#include <iostream>
#include <string.h>
#include "pin.H"
std::ofstream outFile;
// Holds instruction count for a single procedure
typedef struct RtnCount
{
std::string _name;
std::string _image;
ADDRINT _address;
RTN _rtn;
UINT64 _rtnCount;
UINT64 _icount;
struct RtnCount* _next;
} RTN_COUNT;
// Linked list of instruction counts for each routine
RTN_COUNT* RtnList = 0;
// This function is called before every instruction is executed
VOID docount(UINT64* counter) { (*counter)++; }
const char* StripPath(const char* path)
{
const char* file = strrchr(path, '/');
if (file)
return file + 1;
else
return path;
}
// Pin calls this function every time a new rtn is executed
VOID Routine(RTN rtn, VOID* v)
{
// Allocate a counter for this routine
RTN_COUNT* rc = new RTN_COUNT;
// The RTN goes away when the image is unloaded, so save it now
// because we need it in the fini
rc->_name = RTN_Name(rtn);
rc->_image = StripPath(IMG_Name(SEC_Img(RTN_Sec(rtn))).c_str());
rc->_address = RTN_Address(rtn);
rc->_icount = 0;
rc->_rtnCount = 0;
// Add to list of routines
rc->_next = RtnList;
RtnList = rc;
RTN_Open(rtn);
// Insert a call at the entry point of a routine to increment the call count
RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)docount, IARG_PTR, &(rc->_rtnCount), IARG_END);
// For each instruction of the routine
for (INS ins = RTN_InsHead(rtn); INS_Valid(ins); ins = INS_Next(ins))
{
// Insert a call to docount to increment the instruction counter for this rtn
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_PTR, &(rc->_icount), IARG_END);
}
RTN_Close(rtn);
}
// This function is called when the application exits
// It prints the name and count for each procedure
VOID Fini(INT32 code, VOID* v)
{
outFile << std::setw(23) << "Procedure"
<< " " << std::setw(15) << "Image"
<< " " << std::setw(18) << "Address"
<< " " << std::setw(12) << "Calls"
<< " " << std::setw(12) << "Instructions" << std::endl;
for (RTN_COUNT* rc = RtnList; rc; rc = rc->_next)
{
if (rc->_icount > 0)
outFile << std::setw(23) << rc->_name << " " << std::setw(15) << rc->_image << " " << std::setw(18) << std::hex << rc->_address << std::dec
<< " " << std::setw(12) << rc->_rtnCount << " " << std::setw(12) << rc->_icount << std::endl;
}
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This Pintool counts the number of times a routine is executed" << std::endl;
std::cerr << "and the number of instructions executed in a routine" << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Initialize symbol table code, needed for rtn instrumentation
outFile.open("proccount.out");
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
// Register Routine to be called to instrument rtn
// Register Fini to be called when the application exits
// Start the program, never returns
return 0;
}
@ IARG_PTR
Type: "VOID *". Constant value (additional pointer arg required)
Definition: types_vmapi.PH:216
BOOL INS_Valid(INS ins)
Checks if the instruction is valid.
INS INS_Next(INS ins)
Returns the instruction that follows this instruction.
PIN_CALLBACK RTN_AddInstrumentFunction(RTN_INSTRUMENT_CALLBACK fun, VOID *val)
Adds a function used to instrument at routine granularity.
ADDRINT RTN_Address(RTN rtn)
Returns the address in memory of the RTN.
INS RTN_InsHead(RTN rtn)
Returns the first instruction of the RTN, or INS_Invalid() if no instructions.
VOID RTN_InsertCall(RTN rtn, IPOINT action, AFUNPTR funptr,...)
Inserts a call relative to an RTN.
VOID RTN_Open(RTN rtn)
Opens the given RTN.
SEC RTN_Sec(RTN rtn)
Returns the section that contains this routine.
VOID RTN_Close(RTN rtn)
Closes the given RTN.
const std::string & RTN_Name(RTN rtn)
Returns the name of the routine.
IMG SEC_Img(SEC sec)
Returns the image that contains this section.

Using PIN_SafeCopy()

PIN_SafeCopy is used to copy the specified number of bytes from a source memory region to a destination memory region. This function guarantees safe return to the caller even if the source or destination regions are inaccessible (entirely or partially).

Use of this function also guarantees that the tool reads or writes the values used by the application. For example, on Windows, Pin replaces certain TEB fields when running a tool's analysis code. If the tool accessed these fields directly, it would see the modified values rather than the original ones. Using PIN_SafeCopy() allows the tool to read or write the application's values for these fields.

We recommend using this API any time a tool reads or writes application memory.

$ ../../../pin -t obj-ia32/safecopy.so -- /bin/cp makefile obj-ia32/safecopy.so.makefile.copy
$ head safecopy.out
Emulate loading from addr 0xbff0057c to ebx
Emulate loading from addr 0x64ffd4 to eax
Emulate loading from addr 0xbff00598 to esi
Emulate loading from addr 0x6501c8 to edi
Emulate loading from addr 0x64ff14 to edx
Emulate loading from addr 0x64ff1c to edx
Emulate loading from addr 0x64ff24 to edx
Emulate loading from addr 0x64ff2c to edx
Emulate loading from addr 0x64ff34 to edx
Emulate loading from addr 0x64ff3c to edx

The example can be found in source/tools/ManualExamples/safecopy.cpp.

/*
* Copyright (C) 2005-2025 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <stdio.h>
#include "pin.H"
#include <iostream>
#include <fstream>
std::ofstream* out = 0;
//=======================================================
// Analysis routines
//=======================================================
// Move from memory to register
ADDRINT DoLoad(REG reg, ADDRINT* addr)
{
*out << "Emulate loading from addr " << addr << " to " << REG_StringShort(reg) << std::endl;
ADDRINT value;
PIN_SafeCopy(&value, addr, sizeof(ADDRINT));
return value;
}
//=======================================================
// Instrumentation routines
//=======================================================
VOID EmulateLoad(INS ins, VOID* v)
{
// Find the instructions that move a value from memory to a register
if (INS_Opcode(ins) == XED_ICLASS_MOV && INS_IsMemoryRead(ins) && INS_OperandIsReg(ins, 0) && INS_OperandIsMemory(ins, 1))
{
// op0 <- *op1
IARG_RETURN_REGS, INS_OperandReg(ins, 0), IARG_END);
// Delete the instruction
INS_Delete(ins);
}
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool demonstrates the use of SafeCopy" << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Write to a file since std::cout and std::cerr maybe closed by the application
out = new std::ofstream("safecopy.out");
// Initialize pin & symbol manager
if (PIN_Init(argc, argv)) return Usage();
// Register EmulateLoad to be called to instrument instructions
INS_AddInstrumentFunction(EmulateLoad, 0);
// Never returns
return 0;
}
@ IARG_MEMORYREAD_EA
Type: ADDRINT. Effective address of a memory read, only valid if INS_IsMemoryRead is true and at IPOI...
Definition: types_vmapi.PH:266
@ IARG_RETURN_REGS
Register to write analysis function return value (additional register arg required)....
Definition: types_vmapi.PH:476
OPCODE INS_Opcode(INS ins)
REG INS_OperandReg(INS ins, UINT32 n)
BOOL INS_OperandIsMemory(INS ins, UINT32 n)
BOOL INS_IsMemoryRead(INS ins)
BOOL INS_OperandIsReg(INS ins, UINT32 n)
VOID INS_Delete(INS ins)
size_t PIN_SafeCopy(VOID *dst, const VOID *src, size_t size)
std::string REG_StringShort(REG reg)
REG
Definition: reg_ia32.PH:19

Order of Instrumentation

Pin provides tools with multiple ways to control the execution order of analysis calls. The execution order depends mainly on the insertion action (IPOINT) and call order (CALL_ORDER). The example below illustrates this behavior by instrumenting all return instructions in three different ways. Additional examples can be found in source/tools/InstrumentationOrderAndVersion.

$ ../../../pin -t obj-ia32/invocation.so -- obj-ia32/little_malloc
$ head invocation.out
After: IP = 0x64bc5e
Before: IP = 0x64bc5e
Taken: IP = 0x63a12e
After: IP = 0x64bc5e
Before: IP = 0x64bc5e
Taken: IP = 0x641c76
After: IP = 0x641ca6
After: IP = 0x64bc5e
Before: IP = 0x64bc5e
Taken: IP = 0x648b02

The example can be found in source/tools/ManualExamples/invocation.cpp.

/*
* Copyright (C) 2009-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include "pin.H"
#include <iostream>
#include <fstream>
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "invocation.out", "specify output file name");
std::ofstream OutFile;
/*
* Analysis routines
*/
VOID Taken(const CONTEXT* ctxt)
{
ADDRINT TakenIP = (ADDRINT)PIN_GetContextReg(ctxt, REG_INST_PTR);
OutFile << "Taken: IP = " << std::hex << TakenIP << std::dec << std::endl;
}
VOID Before(CONTEXT* ctxt)
{
ADDRINT BeforeIP = (ADDRINT)PIN_GetContextReg(ctxt, REG_INST_PTR);
OutFile << "Before: IP = " << std::hex << BeforeIP << std::dec << std::endl;
}
VOID After(CONTEXT* ctxt)
{
ADDRINT AfterIP = (ADDRINT)PIN_GetContextReg(ctxt, REG_INST_PTR);
OutFile << "After: IP = " << std::hex << AfterIP << std::dec << std::endl;
}
/*
* Instrumentation routines
*/
VOID ImageLoad(IMG img, VOID* v)
{
for (SEC sec = IMG_SecHead(img); SEC_Valid(sec); sec = SEC_Next(sec))
{
// RTN_InsertCall() and INS_InsertCall() are executed in order of
// appearance. In the code sequence below, the IPOINT_AFTER is
// executed before the IPOINT_BEFORE.
for (RTN rtn = SEC_RtnHead(sec); RTN_Valid(rtn); rtn = RTN_Next(rtn))
{
// Open the RTN.
RTN_Open(rtn);
// IPOINT_AFTER is implemented by instrumenting each return
// instruction in a routine. Pin tries to find all return
// instructions, but success is not guaranteed.
RTN_InsertCall(rtn, IPOINT_AFTER, (AFUNPTR)After, IARG_CONTEXT, IARG_END);
// Examine each instruction in the routine.
for (INS ins = RTN_InsHead(rtn); INS_Valid(ins); ins = INS_Next(ins))
{
if (INS_IsRet(ins))
{
// instrument each return instruction.
// IPOINT_TAKEN_BRANCH always occurs last.
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)Before, IARG_CONTEXT, IARG_END);
INS_InsertCall(ins, IPOINT_TAKEN_BRANCH, (AFUNPTR)Taken, IARG_CONTEXT, IARG_END);
}
}
// Close the RTN.
RTN_Close(rtn);
}
}
}
VOID Fini(INT32 code, VOID* v) { OutFile.close(); }
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This is the invocation pintool" << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Initialize pin & symbol manager
if (PIN_Init(argc, argv)) return Usage();
// Register ImageLoad to be called to instrument instructions
// Write to a file since std::cout and std::cerr maybe closed by the application
OutFile.open(KnobOutputFile.Value().c_str());
OutFile.setf(std::ios::showbase);
// Start the program, never returns
return 0;
}
/* ===================================================================== */
ADDRINT PIN_GetContextReg(const CONTEXT *ctxt, REG reg)
SEC IMG_SecHead(IMG img)
Returns the first section in the image.
@ IARG_CONTEXT
Definition: types_vmapi.PH:412
@ IPOINT_AFTER
Definition: types_vmapi.PH:145
@ IPOINT_TAKEN_BRANCH
Definition: types_vmapi.PH:155
BOOL INS_IsRet(INS ins)
BOOL RTN_Valid(RTN rtn)
Checks if the RTN is valid.
RTN RTN_Next(RTN rtn)
Returns the routine that follows this routine, or RTN_Invalid() if it is the last.
SEC SEC_Next(SEC sec)
Returns the section that follows the given section, or SEC_Invalid() if it is last.
BOOL SEC_Valid(SEC sec)
Checks if the section is valid.
RTN SEC_RtnHead(SEC sec)
Returns the first RTN of the section, or RTN_Invalid() if no RTNs.
Definition: types_vmapi.PH:60

Finding the Value of Function Arguments

Often one needs the know the value of the argument passed into a function, or the return value. You can use Pin to find this information. Using the RTN_InsertCall() function, you can specify the arguments of interest.

The example below prints the input argument for malloc() and free(), and the return value from malloc().

$ ../../../pin -t obj-ia32/malloctrace.so -- /bin/cp makefile obj-ia32/malloctrace.so.makefile.copy
$ head malloctrace.out
malloc(0x24d)
  returns 0x6504f8
malloc(0x57)
  returns 0x650748
malloc(0xc)
  returns 0x6507a0
malloc(0x3c0)
  returns 0x6507b0
malloc(0xc)
  returns 0x650b70

The example can be found in source/tools/ManualExamples/malloctrace.cpp.

/*
* Copyright (C) 2004-2023 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include "pin.H"
#include <iostream>
#include <fstream>
/* ===================================================================== */
/* Names of malloc and free */
/* ===================================================================== */
#define MALLOC "malloc"
#define FREE "free"
/* ===================================================================== */
/* Global Variables */
/* ===================================================================== */
std::ofstream TraceFile;
/* ===================================================================== */
/* Commandline Switches */
/* ===================================================================== */
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "malloctrace.out", "specify trace file name");
/* ===================================================================== */
/* ===================================================================== */
/* Analysis routines */
/* ===================================================================== */
VOID Arg1Before(CHAR* name, ADDRINT size) { TraceFile << name << "(" << size << ")" << std::endl; }
VOID MallocAfter(ADDRINT ret) { TraceFile << " returns " << ret << std::endl; }
/* ===================================================================== */
/* Instrumentation routines */
/* ===================================================================== */
VOID Image(IMG img, VOID* v)
{
// Instrument the malloc() and free() functions. Print the input argument
// of each malloc() or free(), and the return value of malloc().
//
// Find the malloc() function.
RTN mallocRtn = RTN_FindByName(img, MALLOC);
if (RTN_Valid(mallocRtn))
{
RTN_Open(mallocRtn);
// Instrument malloc() to print the input argument value and the return value.
RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)Arg1Before, IARG_ADDRINT, MALLOC, IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);
RTN_Close(mallocRtn);
}
// Find the free() function.
RTN freeRtn = RTN_FindByName(img, FREE);
if (RTN_Valid(freeRtn))
{
RTN_Open(freeRtn);
// Instrument free() to print the input argument value.
RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)Arg1Before, IARG_ADDRINT, FREE, IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
RTN_Close(freeRtn);
}
}
/* ===================================================================== */
VOID Fini(INT32 code, VOID* v) { TraceFile.close(); }
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool produces a trace of calls to malloc." << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Initialize pin & symbol manager
if (PIN_Init(argc, argv))
{
return Usage();
}
// Write to a file since std::cout and std::cerr maybe closed by the application
TraceFile.open(KnobOutputFile.Value().c_str());
TraceFile << std::hex;
TraceFile.setf(std::ios::showbase);
// Register Image to be called to instrument functions.
// Never returns
return 0;
}
/* ===================================================================== */
/* eof */
/* ===================================================================== */
@ IARG_FUNCRET_EXITPOINT_VALUE
Type: ADDRINT. Function result. Valid only at return instruction.
Definition: types_vmapi.PH:395
@ IARG_ADDRINT
Type: ADDRINT. Constant value (additional arg required)
Definition: types_vmapi.PH:215
@ IARG_FUNCARG_ENTRYPOINT_VALUE
Definition: types_vmapi.PH:393
RTN RTN_FindByName(IMG img, const CHAR *name)
Finds a routine in an image by name.

Finding Functions By Name on Windows

Finding functions by name on Windows requires a different methodology. Several symbols could resolve to the same function address. It is important to check all symbol names.

The following example finds the function name in the symbol table, and uses the symbol address to find the appropriate RTN.

$ ..\..\..\pin -t obj-ia32\w_malloctrace.dll -- ..\Tests\obj-ia32\cp-pin.exe makefile w_malloctrace.makefile.copy
$ head *.out
Before: RtlAllocateHeap(00150000, 0, 0x94)
After: RtlAllocateHeap  returns 0x153440
After: RtlAllocateHeap  returns 0x153440
Before: RtlAllocateHeap(00150000, 0, 0x20)
After: RtlAllocateHeap  returns 0
After: RtlAllocateHeap  returns 0x1567c0
Before: RtlAllocateHeap(019E0000, 0x8, 0x1800)
After: RtlAllocateHeap  returns 0x19e0688
Before: RtlAllocateHeap(00150000, 0, 0x1a)thread begin 0

After: RtlAllocateHeap  returns 0

The example can be found in source/tools/ManualExamples/w_malloctrace.cpp.

/*
* Copyright (C) 2004-2024 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
/* ===================================================================== */
/* This example demonstrates finding a function by name on Windows. */
/* ===================================================================== */
#include "pin.H"
#include <windows/pinrt_windows.h>
#include <iostream>
#include <fstream>
/* ===================================================================== */
/* Global Variables */
/* ===================================================================== */
std::ofstream TraceFile;
/* ===================================================================== */
/* Commandline Switches */
/* ===================================================================== */
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "w_malloctrace.out", "specify trace file name");
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool produces a trace of calls to RtlAllocateHeap.";
std::cerr << std::endl << std::endl;
std::cerr << std::endl;
return -1;
}
/* ===================================================================== */
/* Analysis routines */
/* ===================================================================== */
VOID Before(CHAR* name, WINDOWS::HANDLE hHeap, WINDOWS::DWORD dwFlags, WINDOWS::DWORD dwBytes)
{
TraceFile << "Before: " << name << "(" << std::hex << hHeap << ", " << dwFlags << ", " << dwBytes << ")" << std::dec << std::endl;
}
VOID After(CHAR* name, ADDRINT ret) { TraceFile << "After: " << name << " returns " << std::hex << ret << std::dec << std::endl; }
/* ===================================================================== */
/* Instrumentation routines */
/* ===================================================================== */
VOID Image(IMG img, VOID* v)
{
// Walk through the symbols in the symbol table.
//
for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
{
std::string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);
// Find the RtlAllocHeap() function.
if (undFuncName == "RtlAllocateHeap")
{
RTN allocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(allocRtn))
{
// Instrument to print the input argument value and the return value.
RTN_Open(allocRtn);
RTN_InsertCall(allocRtn, IPOINT_BEFORE, (AFUNPTR)Before, IARG_ADDRINT, "RtlAllocateHeap",
2, IARG_END);
RTN_InsertCall(allocRtn, IPOINT_AFTER, (AFUNPTR)After, IARG_ADDRINT, "RtlAllocateHeap",
RTN_Close(allocRtn);
}
}
}
}
/* ===================================================================== */
VOID Fini(INT32 code, VOID* v) { TraceFile.close(); }
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Initialize pin & symbol manager
if (PIN_Init(argc, argv))
{
return Usage();
}
// Write to a file since std::cout and std::cerr maybe closed by the application
TraceFile.open(KnobOutputFile.Value().c_str());
TraceFile << std::hex;
TraceFile.setf(std::ios::showbase);
// Register Image to be called to instrument functions.
// Never returns
return 0;
}
/* ===================================================================== */
/* eof */
/* ===================================================================== */
SYM IMG_RegsymHead(IMG img)
Returns the first regular symbol in the image.
ADDRINT IMG_LowAddress(IMG img)
Tells the lowest address of any code or data loaded by the image.
RTN RTN_FindByAddress(ADDRINT address)
Finds the routine that contains the given memory address.
ADDRINT SYM_Value(SYM sym)
Returns the value of the symbol, usually an address relative to beginning of image.
BOOL SYM_Valid(SYM sym)
Checks if the symbol is valid.
const std::string & SYM_Name(SYM sym)
Returns the name of the symbol.
std::string PIN_UndecorateSymbolName(const std::string &symbolName, UNDECORATION style)
Undecorates symbol name.
SYM SYM_Next(SYM sym)
Returns the routine that follows this symbol, or SYM_Invalid() if it is the last.
@ UNDECORATION_NAME_ONLY
Undecorate to [scope::]name.
Definition: sym_undecorate.PH:19

Instrumenting Threaded Applications

The following example demonstrates using the ThreadStart() and ThreadFini() notification callbacks. Although ThreadStart() and ThreadFini() are executed under the VM and client locks, they could still contend with resources that are shared by other analysis routines. Using PIN_GetLock() prevents this.

Note that there is known isolation issue when using Pin on Windows. On Windows, a deadlock can occur if a tool opens a file in a callback when run on a multi-threaded application. To work around this problem, open one file in main, and tag the data with the thread ID. See source/tools/ManualExamples/buffer_windows.cpp as an example. This problem does not exist on Linux.

$ ../../../pin -t obj-ia32/malloc_mt.so -- obj-ia32/thread_lin
$ head malloc_mt.out
thread begin 0
thread 0 entered malloc(24d)
thread 0 entered malloc(57)
thread 0 entered malloc(c)
thread 0 entered malloc(3c0)
thread 0 entered malloc(c)
thread 0 entered malloc(58)
thread 0 entered malloc(56)
thread 0 entered malloc(19)
thread 0 entered malloc(25c)

The example can be found in source/tools/ManualExamples/malloc_mt.cpp

/*
* Copyright (C) 2009-2025 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <stdio.h>
#include "pin.H"
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "malloc_mt.out", "specify output file name");
//==============================================================
// Analysis Routines
//==============================================================
// Note: threadid+1 is used as an argument to the PIN_GetLock()
// routine as a debugging aid. This is the value that
// the lock is set to, so it must be non-zero.
// lock serializes access to the output file.
FILE* out;
PIN_LOCK pinLock;
// Note that opening a file in a callback is only supported on Linux systems.
// See buffer-win.cpp for how to work around this issue on Windows.
//
// This routine is executed every time a thread is created.
VOID ThreadStart(THREADID threadid, CONTEXT* ctxt, INT32 flags, VOID* v)
{
PIN_GetLock(&pinLock, threadid + 1);
fprintf(out, "thread begin %d\n", threadid);
fflush(out);
PIN_ReleaseLock(&pinLock);
}
// This routine is executed every time a thread is destroyed.
VOID ThreadFini(THREADID threadid, const CONTEXT* ctxt, INT32 code, VOID* v)
{
PIN_GetLock(&pinLock, threadid + 1);
fprintf(out, "thread end %d code %d\n", threadid, code);
fflush(out);
PIN_ReleaseLock(&pinLock);
}
// This routine is executed each time malloc is called.
VOID BeforeMalloc(int size, THREADID threadid)
{
PIN_GetLock(&pinLock, threadid + 1);
fprintf(out, "thread %d entered malloc(%d)\n", threadid, size);
fflush(out);
PIN_ReleaseLock(&pinLock);
}
//====================================================================
// Instrumentation Routines
//====================================================================
// This routine is executed for each image.
VOID ImageLoad(IMG img, VOID*)
{
RTN rtn = RTN_FindByName(img, "malloc");
if (RTN_Valid(rtn))
{
RTN_Open(rtn);
IARG_END);
RTN_Close(rtn);
}
}
// This routine is executed once at the end.
VOID Fini(INT32 code, VOID* v) { fclose(out); }
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
PIN_ERROR("This Pintool prints a trace of malloc calls in the guest application\n" + KNOB_BASE::StringKnobSummary() + "\n");
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(INT32 argc, CHAR** argv)
{
// Initialize the pin lock
PIN_InitLock(&pinLock);
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
out = fopen(KnobOutputFile.Value().c_str(), "w");
// Register ImageLoad to be called when each image is loaded.
// Register Analysis routines to be called when a thread begins/ends
PIN_AddThreadStartFunction(ThreadStart, 0);
PIN_AddThreadFiniFunction(ThreadFini, 0);
// Register Fini to be called when the application exits
// Never returns
return 0;
}
@ IARG_THREAD_ID
Type: THREADID. Application thread id.
Definition: types_vmapi.PH:403
VOID PIN_InitLock(PIN_LOCK *lock)
INT32 PIN_ReleaseLock(PIN_LOCK *lock)
VOID PIN_GetLock(PIN_LOCK *lock, INT32 val)
PIN_CALLBACK PIN_AddThreadStartFunction(THREAD_START_CALLBACK fun, VOID *val)
PIN_CALLBACK PIN_AddThreadFiniFunction(THREAD_FINI_CALLBACK fun, VOID *val)
INT32 THREADID
Definition: types_vmapi.PH:1016
Definition: lock.PH:17

Using TLS

Pin provides efficient thread local storage (TLS) APIs. These APIs allow a tool to create thread-specific data. The example below demonstrates how to use these APIs.

$ ../../../pin -t obj-ia32/inscount_tls.so -- obj-ia32/thread_lin
$ head
Count[0]= 237993
Count[1]= 213296
Count[2]= 209223
Count[3]= 209223
Count[4]= 209223
Count[5]= 209223
Count[6]= 209223
Count[7]= 209223
Count[8]= 209223
Count[9]= 209223

The example can be found in source/tools/ManualExamples/inscount_tls.cpp

/*
* Copyright (C) 2004-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <iostream>
#include <fstream>
#include "pin.H"
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "", "specify output file name");
INT32 numThreads = 0;
std::ostream* OutFile = NULL;
// Force each thread's data to be in its own data cache line so that
// multiple threads do not contend for the same data cache line.
// This avoids the false sharing problem.
#define PADSIZE 56 // 64 byte line size: 64-8
// a running count of the instructions
class thread_data_t
{
public:
thread_data_t() : _count(0) {}
UINT64 _count;
UINT8 _pad[PADSIZE];
};
// key for accessing TLS storage in the threads. initialized once in main()
static TLS_KEY tls_key = INVALID_TLS_KEY;
// This function is called before every block
VOID PIN_FAST_ANALYSIS_CALL docount(UINT32 c, THREADID threadid)
{
thread_data_t* tdata = static_cast< thread_data_t* >(PIN_GetThreadData(tls_key, threadid));
tdata->_count += c;
}
VOID ThreadStart(THREADID threadid, CONTEXT* ctxt, INT32 flags, VOID* v)
{
numThreads++;
thread_data_t* tdata = new thread_data_t;
if (PIN_SetThreadData(tls_key, tdata, threadid) == FALSE)
{
std::cerr << "PIN_SetThreadData failed" << std::endl;
}
}
// Pin calls this function every time a new basic block is encountered.
// It inserts a call to docount.
VOID Trace(TRACE trace, VOID* v)
{
// Visit every basic block in the trace
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
{
// Insert a call to docount for every bbl, passing the number of instructions.
IARG_THREAD_ID, IARG_END);
}
}
// This function is called when the thread exits
VOID ThreadFini(THREADID threadIndex, const CONTEXT* ctxt, INT32 code, VOID* v)
{
thread_data_t* tdata = static_cast< thread_data_t* >(PIN_GetThreadData(tls_key, threadIndex));
*OutFile << "Count[" << decstr(threadIndex) << "] = " << tdata->_count << std::endl;
delete tdata;
}
// This function is called when the application exits
VOID Fini(INT32 code, VOID* v) { *OutFile << "Total number of threads = " << numThreads << std::endl; }
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool counts the number of dynamic instructions executed" << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return 1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
OutFile = KnobOutputFile.Value().empty() ? &std::cout : new std::ofstream(KnobOutputFile.Value().c_str());
// Obtain a key for TLS storage.
tls_key = PIN_CreateThreadDataKey(NULL);
if (tls_key == INVALID_TLS_KEY)
{
std::cerr << "number of already allocated keys reached the MAX_CLIENT_TLS_KEYS limit" << std::endl;
}
// Register ThreadStart to be called when a thread starts.
PIN_AddThreadStartFunction(ThreadStart, NULL);
// Register Fini to be called when thread exits.
PIN_AddThreadFiniFunction(ThreadFini, NULL);
// Register Fini to be called when the application exits.
PIN_AddFiniFunction(Fini, NULL);
// Register Instruction to be called to instrument instructions.
// Start the program, never returns
return 1;
}
#define PIN_FAST_ANALYSIS_CALL
Definition: types_vmapi.PH:893
@ IARG_FAST_ANALYSIS_CALL
No type: Use a fast linkage to call the analysis function. See PIN_FAST_ANALYSIS_CALL.
Definition: types_vmapi.PH:482
@ IPOINT_ANYWHERE
Definition: types_vmapi.PH:150
NORETURN VOID PIN_ExitProcess(INT32 exitCode)
INT32 TLS_KEY
Definition: tls.PH:16
BOOL PIN_SetThreadData(TLS_KEY key, const VOID *data, THREADID threadId)
const TLS_KEY INVALID_TLS_KEY
Definition: tls.PH:21
TLS_KEY PIN_CreateThreadDataKey(DESTRUCTFUN destruct_func)
VOID * PIN_GetThreadData(TLS_KEY key, THREADID threadId)
std::string decstr(INT64 val, UINT32 width=0)
Definition: util.PH:124

Using the Fast Buffering APIs

Pin provides support for buffering data for processing. If all that your analysis callback does is to store its arguments into a buffer, then you should be able to use the buffering API instead, with some performance benefit. PIN_DefineTraceBuffer() defines the buffer that will be used. The buffer is allocated by each thread when it starts up, and deallocated when the thread exits. INS_InsertFillBuffer() writes the requested data directly to the given buffer. The callback delineated in the PIN_DefineTraceBuffer() call is used to process the buffer when the buffer is nearly full, and when the thread exits. Pin does not serialize the calls to this callback, so it is the tool writers responsibilty to make sure this function is thread safe. This example records the PC of all instructions that access memory, and the effective address accessed by the instruction. Note that IARG_REG_REFERENCE, IARG_REG_CONST_REFERENCE, IARG_CONTEXT, IARG_CONST_CONTEXT and IARG_PARTIAL_CONTEXT can NOT be used in the Fast Buffering APIs

$ ../../../pin -t obj-ia32/buffer_linux.so -- obj-ia32/thread_lin
$ tail buffer.out.*.*
3263df   330108
3263df   330108
3263f1   a92f43fc
3263f7   a92f4d7d
326404   a92f43fc
32640a   a92f4bf8
32640a   a92f4bf8
32640f   a92f4d94
32641b   a92f43fc
326421   a92f4bf8

The example can be found in source/tools/ManualExamples/buffer_linux.cpp. This example is appropriate for Linux tools. If you are writing a tool for Windows, please see source/tools/ManualExamples/buffer_windows.cpp

/*
* Copyright (C) 2009-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
/*
* Sample buffering tool
*
* This tool collects an address trace of instructions that access memory
* by filling a buffer. When the buffer overflows,the callback writes all
* of the collected records to a file.
*
*/
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <cstddef>
#include <unistd.h>
#include "pin.H"
/*
* Name of the output file
*/
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "buffer.out", "output file");
/*
* The ID of the buffer
*/
BUFFER_ID bufId;
/*
* Thread specific data
*/
TLS_KEY mlog_key;
/*
* Number of OS pages for the buffer
*/
#define NUM_BUF_PAGES 1024
/*
* Record of memory references. Rather than having two separate
* buffers for reads and writes, we just use one struct that includes a
* flag for type.
*/
struct MEMREF
{
ADDRINT pc;
ADDRINT ea;
UINT32 size;
BOOL read;
};
/*
* MLOG - thread specific data that is not handled by the buffering API.
*/
class MLOG
{
public:
MLOG(THREADID tid);
~MLOG();
VOID DumpBufferToFile(struct MEMREF* reference, UINT64 numElements, THREADID tid);
private:
std::ofstream _ofile;
};
MLOG::MLOG(THREADID tid)
{
const std::string filename = KnobOutputFile.Value() + "." + decstr(getpid()) + "." + decstr(tid);
_ofile.open(filename.c_str());
if (!_ofile)
{
std::cerr << "Error: could not open output file." << std::endl;
exit(1);
}
_ofile << std::hex;
}
MLOG::~MLOG() { _ofile.close(); }
VOID MLOG::DumpBufferToFile(struct MEMREF* reference, UINT64 numElements, THREADID tid)
{
for (UINT64 i = 0; i < numElements; i++, reference++)
{
if (reference->ea != 0) _ofile << reference->pc << " " << reference->ea << std::endl;
}
}
/**************************************************************************
*
* Instrumentation routines
*
**************************************************************************/
/*
* Insert code to write data to a thread-specific buffer for instructions
* that access memory.
*/
VOID Trace(TRACE trace, VOID* v)
{
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
{
for (INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins))
{
{
// We don't know how to treat these instructions
continue;
}
UINT32 memoryOperands = INS_MemoryOperandCount(ins);
for (UINT32 memOp = 0; memOp < memoryOperands; memOp++)
{
UINT32 refSize = INS_MemoryOperandSize(ins, memOp);
// Note that if the operand is both read and written we log it once
// for each.
if (INS_MemoryOperandIsRead(ins, memOp))
{
INS_InsertFillBuffer(ins, IPOINT_BEFORE, bufId, IARG_INST_PTR, offsetof(struct MEMREF, pc), IARG_MEMORYOP_EA,
memOp, offsetof(struct MEMREF, ea), IARG_UINT32, refSize, offsetof(struct MEMREF, size),
IARG_BOOL, TRUE, offsetof(struct MEMREF, read), IARG_END);
}
if (INS_MemoryOperandIsWritten(ins, memOp))
{
INS_InsertFillBuffer(ins, IPOINT_BEFORE, bufId, IARG_INST_PTR, offsetof(struct MEMREF, pc), IARG_MEMORYOP_EA,
memOp, offsetof(struct MEMREF, ea), IARG_UINT32, refSize, offsetof(struct MEMREF, size),
IARG_BOOL, FALSE, offsetof(struct MEMREF, read), IARG_END);
}
}
}
}
}
/**************************************************************************
*
* Callback Routines
*
**************************************************************************/
VOID* BufferFull(BUFFER_ID id, THREADID tid, const CONTEXT* ctxt, VOID* buf, UINT64 numElements, VOID* v)
{
struct MEMREF* reference = (struct MEMREF*)buf;
MLOG* mlog = static_cast< MLOG* >(PIN_GetThreadData(mlog_key, tid));
mlog->DumpBufferToFile(reference, numElements, tid);
return buf;
}
/*
* Note that opening a file in a callback is only supported on Linux systems.
* See buffer-win.cpp for how to work around this issue on Windows.
*/
VOID ThreadStart(THREADID tid, CONTEXT* ctxt, INT32 flags, VOID* v)
{
// There is a new MLOG for every thread. Opens the output file.
MLOG* mlog = new MLOG(tid);
// A thread will need to look up its MLOG, so save pointer in TLS
PIN_SetThreadData(mlog_key, mlog, tid);
}
VOID ThreadFini(THREADID tid, const CONTEXT* ctxt, INT32 code, VOID* v)
{
MLOG* mlog = static_cast< MLOG* >(PIN_GetThreadData(mlog_key, tid));
delete mlog;
PIN_SetThreadData(mlog_key, 0, tid);
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool demonstrates the basic use of the buffering API." << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Initialize PIN library. Print help message if -h(elp) is specified
// in the command line or the command line is invalid
if (PIN_Init(argc, argv))
{
return Usage();
}
// Initialize the memory reference buffer;
// set up the callback to process the buffer.
//
bufId = PIN_DefineTraceBuffer(sizeof(struct MEMREF), NUM_BUF_PAGES, BufferFull, 0);
if (bufId == BUFFER_ID_INVALID)
{
std::cerr << "Error: could not allocate initial buffer" << std::endl;
return 1;
}
// Initialize thread-specific data not handled by buffering api.
mlog_key = PIN_CreateThreadDataKey(0);
// add an instrumentation function
// add callbacks
PIN_AddThreadStartFunction(ThreadStart, 0);
PIN_AddThreadFiniFunction(ThreadFini, 0);
// Start the program, never returns
return 0;
}
INS BBL_InsHead(BBL bbl)
Returns the first instruction of the BBL.
BUFFER_ID PIN_DefineTraceBuffer(size_t recordSize, UINT32 numPages, TRACE_BUFFER_CALLBACK fun, VOID *val)
UINT32 BUFFER_ID
Definition: types_vmapi.PH:87
const BUFFER_ID BUFFER_ID_INVALID
Definition: types_vmapi.PH:93
@ IARG_BOOL
Type: BOOL. Constant (additional BOOL arg required)
Definition: types_vmapi.PH:217
USIZE INS_MemoryOperandSize(INS ins, UINT32 memoryOp)
BOOL INS_HasMemoryVector(INS ins)
BOOL INS_IsStandardMemop(INS ins)
VOID INS_InsertFillBuffer(INS ins, IPOINT action, BUFFER_ID id,...)

Finding the Static Properties of an Image

It is also possible to use Pin to examine binaries without instrumenting them. This is useful when you need to know static properties of an image. The sample tool below counts the number of instructions in an image, but does not insert any instrumentation.

The example can be found in source/tools/ManualExamples/staticcount.cpp

/*
* Copyright (C) 2004-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
//
// This tool prints a trace of image load and unload events
//
#include <stdio.h>
#include <iostream>
#include "pin.H"
// Pin calls this function every time a new img is loaded
// It can instrument the image, but this example merely
// counts the number of static instructions in the image
VOID ImageLoad(IMG img, VOID* v)
{
UINT32 count = 0;
for (SEC sec = IMG_SecHead(img); SEC_Valid(sec); sec = SEC_Next(sec))
{
for (RTN rtn = SEC_RtnHead(sec); RTN_Valid(rtn); rtn = RTN_Next(rtn))
{
// Prepare for processing of RTN, an RTN is not broken up into BBLs,
// it is merely a sequence of INSs
RTN_Open(rtn);
for (INS ins = RTN_InsHead(rtn); INS_Valid(ins); ins = INS_Next(ins))
{
count++;
}
// to preserve space, release data associated with RTN after we have processed it
RTN_Close(rtn);
}
}
fprintf(stderr, "Image %s has %d instructions\n", IMG_Name(img).c_str(), count);
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool prints a log of image load and unload events" << std::endl;
std::cerr << " along with static instruction counts for each image." << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// prepare for image instrumentation mode
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
// Register ImageLoad to be called when an image is loaded
// Start the program, never returns
return 0;
}

Detaching Pin from the Application

Pin can relinquish control of application any time when invoked via PIN_Detach. Control is returned to the original non-instrumented code and the application runs at native speed. Thereafter no instrumented code is ever executed.

The example can be found in source/tools/ManualExamples/detach.cpp

/*
* Copyright (C) 2004-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <stdio.h>
#include "pin.H"
#include <iostream>
// This tool shows how to detach Pin from an
// application that is under Pin's control.
UINT64 icount = 0;
#define N 10000
VOID docount()
{
icount++;
// Release control of application if 10000
// instructions have been executed
if ((icount % N) == 0)
{
}
}
VOID Instruction(INS ins, VOID* v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END); }
VOID ByeWorld(VOID* v) { std::cerr << std::endl << "Detached at icount = " << N << std::endl; }
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool demonstrates how to detach Pin from an " << std::endl;
std::cerr << "application that is under Pin's control" << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
if (PIN_Init(argc, argv)) return Usage();
// Callback function to invoke for every
// execution of an instruction
INS_AddInstrumentFunction(Instruction, 0);
// Callback functions to invoke before
// Pin releases control of the application
PIN_AddDetachFunction(ByeWorld, 0);
// Never returns
return 0;
}
VOID PIN_Detach()
PIN_CALLBACK PIN_AddDetachFunction(DETACH_CALLBACK fun, VOID *val)

Replacing a Routine in Probe Mode

Probe mode is a method of using Pin to insert probes at the start of specified routines. A probe is a jump instruction that is placed at the start of the specified routine. The probe redirects the flow of control to the replacement function. Before the probe is inserted, the first few instructions of the specified routine are relocated. It is not uncommon for the replacement function to call the replaced routine. Pin provides the relocated address to facilitate this. See the example below.

In probe mode, the application and the replacement routine are run natively. This improves performance, but it puts more responsibility on the tool writer. Probes can only be placed on RTN boundaries.

Many of the PIN APIs that are available in JIT mode are not applicable in Probe mode. In particular, the Pin thread APIs are not supported in Probe mode, because Pin has no information about the threads when the application is run natively. For more information, check the RTN API documentation.

The tool writer must guarantee that there is no jump target where the probe is placed. A probe may be up to 14 bytes long.

Also, it is the tool writer's responsibility to ensure that no thread is currently executing the code where a probe is inserted. Tool writers are encouraged to insert probes when an image is loaded to avoid this problem. Pin will automatically remove the probes when an image is unloaded.

When using probes, Pin must be started with the PIN_StartProgramProbed() API.

The example can be found in source/tools/ManualExamples/replacesigprobed.cpp. To build this test, execute:

$ make replacesigprobed.test
/*
* Copyright (C) 2006-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
// Replace an original function with a custom function defined in the tool using
// probes. The replacement function has a different signature from that of the
// original replaced function.
#include "pin.H"
#include <iostream>
typedef VOID* (*FP_MALLOC)(size_t);
// This is the replacement routine.
//
VOID* NewMalloc(FP_MALLOC orgFuncptr, UINT32 arg0, ADDRINT returnIp)
{
// Normally one would do something more interesting with this data.
//
std::cout << "NewMalloc (" << std::hex << ADDRINT(orgFuncptr) << ", " << std::dec << arg0 << ", " << std::hex << returnIp << ")" << std::endl << std::flush;
// Call the relocated entry point of the original (replaced) routine.
//
VOID* v = orgFuncptr(arg0);
return v;
}
// Pin calls this function every time a new img is loaded.
// It is best to do probe replacement when the image is loaded,
// because only one thread knows about the image at this time.
//
VOID ImageLoad(IMG img, VOID* v)
{
// See if malloc() is present in the image. If so, replace it.
//
RTN rtn = RTN_FindByName(img, "malloc");
if (RTN_Valid(rtn))
{
{
std::cout << "Replacing malloc in " << IMG_Name(img) << std::endl;
// Define a function prototype that describes the application routine
// that will be replaced.
//
PROTO proto_malloc = PROTO_Allocate(PIN_PARG(void*), CALLINGSTD_DEFAULT, "malloc", PIN_PARG(int), PIN_PARG_END());
// Replace the application routine with the replacement function.
// Additional arguments have been added to the replacement routine.
//
RTN_ReplaceSignatureProbed(rtn, AFUNPTR(NewMalloc), IARG_PROTOTYPE, proto_malloc, IARG_ORIG_FUNCPTR,
// Free the function prototype.
//
PROTO_Free(proto_malloc);
}
else
{
std::cout << "Skip replacing malloc in " << IMG_Name(img) << " since it is not safe." << std::endl;
}
}
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool demonstrates how to replace an original" << std::endl;
std::cerr << " function with a custom function defined in the tool " << std::endl;
std::cerr << " using probes. The replacement function has a different " << std::endl;
std::cerr << " signature from that of the original replaced function." << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main: Initialize and start Pin in Probe mode. */
/* ===================================================================== */
int main(INT32 argc, CHAR* argv[])
{
// Initialize symbol processing
//
// Initialize pin
//
if (PIN_Init(argc, argv)) return Usage();
// Register ImageLoad to be called when an image is loaded
//
// Start the program in probe mode, never returns
//
return 0;
}
@ IARG_ORIG_FUNCPTR
Type: AFUNPTR. Function pointer to the relocated entry of the original uninstrumented function.
Definition: types_vmapi.PH:399
@ IARG_RETURN_IP
Type: ADDRINT. Return address for function call, valid only at the function entry point.
Definition: types_vmapi.PH:397
@ IARG_PROTOTYPE
Type: PROTO. The function prototype of the application function. See PROTO API.
Definition: types_vmapi.PH:401
VOID PIN_StartProgramProbed()
PROTO_CLASS * PROTO
Definition: types_vmapi.PH:1011
VOID PROTO_Free(PROTO proto)
#define PIN_PARG(t)
Definition: types_vmapi.PH:642
PROTO PROTO_Allocate(PARG_T returnArg, CALLINGSTD_TYPE cstype, const char *name,...)
#define PIN_PARG_END()
Definition: types_vmapi.PH:663
BOOL RTN_IsSafeForProbedReplacement(RTN rtn)
AFUNPTR RTN_ReplaceSignatureProbed(RTN replacedRtn, AFUNPTR replacementFun,...)

Instrumenting Child Processes

The PIN_AddFollowChildProcessFunction() allows you to define the function you will like to execute before an execv'd process starts. Use the -follow_execv option on the command line to instrument the child processes, like this:

$ ../../../pin -follow_execv -t obj-intel64/follow_child_tool.so -- obj-intel64/follow_child_app1 obj-intel64/follow_child_app2

The example can be found in source/tools/ManualExamples/follow_child_tool.cpp. To build this test, execute:

$ make follow_child_tool.test
/*
* Copyright (C) 2009-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include "pin.H"
#include <iostream>
#include <stdio.h>
#include <unistd.h>
/* ===================================================================== */
/* Command line Switches */
/* ===================================================================== */
BOOL FollowChild(CHILD_PROCESS cProcess, VOID* userData)
{
fprintf(stdout, "before child:%u\n", getpid());
return TRUE;
}
/* ===================================================================== */
int main(INT32 argc, CHAR** argv)
{
PIN_Init(argc, argv);
return 0;
}
PIN_CALLBACK PIN_AddFollowChildProcessFunction(FOLLOW_CHILD_PROCESS_CALLBACK fun, VOID *val)
void * CHILD_PROCESS
Definition: child_process_client.PH:16

Instrumenting Before and After Forks

Pin allows Pintools to register for notification callbacks around forks. The PIN_AddForkFunction() and PIN_AddForkFunctionProbed() callbacks allow you to define the function you want to execute at one of these FPOINTs:

    FPOINT_BEFORE            Call-back in parent, just before fork.
    FPOINT_AFTER_IN_PARENT   Call-back in parent, immediately after fork.
    FPOINT_AFTER_IN_CHILD    Call-back in child, immediately after fork.

Note that PIN_AddForkFunction() is used for JIT mode and PIN_AddForkFunctionProbed() is used for Probed mode. If the fork() fails, the FPOINT_AFTER_IN_PARENT callback, if it is defined, will execute anyway.

The example can be found in source/tools/ManualExamples/fork_jit_tool.cpp. To build this test, execute:

$ make fork_jit_tool.test
/*
* Copyright (C) 2009-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include "pin.H"
#include <iostream>
#include <fstream>
INT32 Usage()
{
std::cerr << "This pin tool registers callbacks around fork().\n"
"\n";
std::cerr << std::endl;
return -1;
}
pid_t parent_pid;
PIN_LOCK pinLock;
VOID BeforeFork(THREADID threadid, const CONTEXT* ctxt, VOID* arg)
{
PIN_GetLock(&pinLock, threadid + 1);
std::cerr << "TOOL: Before fork." << std::endl;
PIN_ReleaseLock(&pinLock);
parent_pid = PIN_GetPid();
}
VOID AfterForkInParent(THREADID threadid, const CONTEXT* ctxt, VOID* arg)
{
PIN_GetLock(&pinLock, threadid + 1);
std::cerr << "TOOL: After fork in parent." << std::endl;
PIN_ReleaseLock(&pinLock);
if (PIN_GetPid() != parent_pid)
{
std::cerr << "PIN_GetPid() fails in parent process" << std::endl;
exit(-1);
}
}
VOID AfterForkInChild(THREADID threadid, const CONTEXT* ctxt, VOID* arg)
{
PIN_GetLock(&pinLock, threadid + 1);
std::cerr << "TOOL: After fork in child." << std::endl;
PIN_ReleaseLock(&pinLock);
if ((PIN_GetPid() == parent_pid) || (getppid() != parent_pid))
{
std::cerr << "PIN_GetPid() fails in child process" << std::endl;
exit(-1);
}
}
int main(INT32 argc, CHAR** argv)
{
if (PIN_Init(argc, argv))
{
return Usage();
}
// Initialize the pin lock
PIN_InitLock(&pinLock);
// Register a notification handler that is called when the application
// forks a new process.
// Never returns
return 0;
}
PIN_CALLBACK PIN_AddForkFunction(FPOINT point, FORK_CALLBACK fun, VOID *val)
@ FPOINT_AFTER_IN_CHILD
Call-back in child, immediately after fork.
Definition: pin_client.PH:1405
@ FPOINT_AFTER_IN_PARENT
Call-back in parent, immediately after fork.
Definition: pin_client.PH:1404
@ FPOINT_BEFORE
Call-back in parent, just before fork.
Definition: pin_client.PH:1403
INT PIN_GetPid()

Managed platforms support

Pin allows Pintools to identify dynamically created code using RTN_IsDynamic() API (only code of functions which are reported by Jit Profiling API). The following example demonstrates use of RTN_IsDynamic() API. This example instruments a program to count the total number of instructions discovered and executed. The instructions are divided to three categories: native instructions, dynamic instructions and instructions without any known routine.

Here is how to run it and display its output with a 32 bit OpenCL sample on Windows:

$ set CL_CONFIG_USE_VTUNE=True
$ set INTEL_JIT_PROFILER32=ia32\bin\pinjitprofiling.dll
$ ia32\bin\pin.exe -t source\tools\JitProfilingApiTests\obj-ia32\DynamicInsCount.dll -support_jit_api -o DynamicInsCount.out -- ..\OpenCL\Win32\Debug\BitonicSort.exe
No command line arguments specified, using default values.
Initializing OpenCL runtime...
Trying to run on a CPU
OpenCL data alignment is 128 bytes.
Reading file 'BitonicSort.cl' (size 3435 bytes)
Sort order is ascending
Input size is 1048576 items
Executing OpenCL kernel...
Executing reference...
Performing verification...
Verification succeeded.
NDRange perf. counter time 12994.272962 ms.
Releasing resources...
$ type JitInsCount.out
===============================================
Number of executed native instructions: 7631596649
Number of executed jitted instructions: 438983207
Number of executed instructions without any known routine: 12246
===============================================
Number of discovered native instructions: 870531
Number of discovered jitted instructions: 223
Number of discovered instructions without any known routine: 36
===============================================

$

The example can be found in source\tools\JitProfilingApiTests\DynamicInsCount.cpp

#include "pin.H"
#include <iostream>
#include <fstream>
// ==================================================================
// Global variables
// ==================================================================
UINT64 insNativeDiscoveredCount = 0; //number of discovered native instructions
UINT64 insDynamicDiscoveredCount = 0; //number of discovered dynamic instructions
UINT64 insNoRtnDiscoveredCount = 0; //number of discovered instructions without any known routine
UINT64 insNativeExecutedCount = 0; //number of executed native instructions
UINT64 insDynamicExecutedCount = 0; //number of executed dynamic instructions
UINT64 insNoRtnExecutedCount = 0; //number of executed instructions without any known routine
std::ostream * out = &cerr;
// =====================================================================
// Command line switches
// =====================================================================
KNOB<string> KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "", "specify file name for output");
// =====================================================================
// Utilities
// =====================================================================
// Print out help message.
INT32 Usage()
{
cerr << "This tool prints out the number of native and dynamic instructions" << endl;
cerr << KNOB_BASE::StringKnobSummary() << endl;
return -1;
}
// =====================================================================
// Analysis routines
// =====================================================================
// This function is called before every native instruction is executed
VOID InsNativeCount()
{
++insNativeExecutedCount;
}
// This function is called before every dynamic instruction is executed
VOID InsDynamicCount()
{
++insDynamicExecutedCount;
}
// This function is called before every instruction without any known routine is executed
VOID InsNoRtnCount()
{
++insNoRtnExecutedCount;
}
// =====================================================================
// Instrumentation callbacks
// =====================================================================
// Pin calls this function every time a new instruction is encountered
VOID Instruction(INS ins, VOID *v)
{
RTN rtn = INS_Rtn(ins);
if (!RTN_Valid(rtn))
{
++insNoRtnDiscoveredCount;
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)InsNoRtnCount, IARG_END);
}
else if (RTN_IsDynamic(rtn))
{
++insDynamicDiscoveredCount;
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)InsDynamicCount, IARG_END);
}
else
{
++insNativeDiscoveredCount;
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)InsNativeCount, IARG_END);
}
}
// Print out analysis results.
// This function is called when the application exits.
// @param[in] code exit code of the application
// @param[in] v value specified by the tool in the
// PIN_AddFiniFunction function call
VOID Fini(INT32 code, VOID *v)
{
*out << "===============================================" << endl;
*out << "Number of executed native instructions: " << insNativeExecutedCount << endl;
*out << "Number of executed dynamic instructions: " << insDynamicExecutedCount << endl;
*out << "Number of executed instructions without any known routine: " << insNoRtnExecutedCount << endl;
*out << "===============================================" << endl;
*out << "Number of discovered native instructions: " << insNativeDiscoveredCount << endl;
*out << "Number of discovered dynamic instructions: " << insDynamicDiscoveredCount << endl;
*out << "Number of discovered instructions without any known routine: " << insNoRtnDiscoveredCount << endl;
*out << "===============================================" << endl;
string fileName = KnobOutputFile.Value();
if (!fileName.empty())
{
delete out;
}
}
// The main procedure of the tool.
// This function is called when the application image is loaded but not yet started.
// @param[in] argc total number of elements in the argv array
// @param[in] argv array of command line arguments,
// including pin -t <toolname> -- ...
int main(int argc, char *argv[])
{
// Initialize symbol processing
// Initialize PIN library. Print help message if -h(elp) is specified
// in the command line or the command line is invalid
if(PIN_Init(argc,argv))
{
return Usage();
}
string fileName = KnobOutputFile.Value();
if (!fileName.empty())
{
out = new std::ofstream(fileName.c_str());
}
// Register Instruction to be called to instrument instructions
INS_AddInstrumentFunction(Instruction, NULL);
// Register function to be called when the application exits
PIN_AddFiniFunction(Fini, NULL);
// Start the program, never returns
return 0;
}
RTN INS_Rtn(INS ins)
Returns the routine that contains this instruction.
BOOL RTN_IsDynamic(RTN rtn)
Checks if the routine is dynamically created.

Pin allows Pintools to instrument just compiled functions using RTN_AddInstrumentFunction API. Following example instruments a program to log Jitting and running of dynamic functions which are reported by Jit Profiling API.

Here is how to run it with a 64 bit OpenCL sample on Linux:

$ setenv CL_CONFIG_USE_VTUNE True
$ setenv INTEL_JIT_PROFILER64 intel64/lib/libpinjitprofiling.so
$ ./pin -t source/tools/JitProfilingApiTests/obj-intel64/DynamicFuncInstrument.so -support_jit_api -o DynamicFuncInstrument.out -- ..\OpenCL\Win32\Debug\BitonicSort.exe
No command line arguments specified, using default values.
Initializing OpenCL runtime...
Trying to run on a CPU
OpenCL data alignment is 128 bytes.
Reading file 'BitonicSort.cl' (size 3435 bytes)
Sort order is ascending
Input size is 1048576 items
Executing OpenCL kernel...
Executing reference...
Performing verification...
Verification succeeded.
NDRange perf. counter time 12994.272962 ms.
Releasing resources...
$

The example can be found in source\tools\JitProfilingApiTests\DynamicFuncInstrument.cpp

#include "pin.H"
#include <iostream>
#include <fstream>
// =====================================================================
// Global variables
// =====================================================================
std::ostream * out = &cerr;
// =====================================================================
// Command line switches
// =====================================================================
KNOB<string> KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "", "specify file name for output");
// =====================================================================
// Utilities
// =====================================================================
// Print out help message.
INT32 Usage()
{
cerr << "This tool prints out the stack filtered by the dynamically created functions only" << endl;
cerr << KNOB_BASE::StringKnobSummary() << endl;
return -1;
}
// =====================================================================
// Analysis routines
// =====================================================================
VOID RtnCallPrint(CHAR * rtnName)
{
*out << "Before run " << rtnName << endl;
}
// =====================================================================
// Instrumentation callbacks
// =====================================================================
// Pin calls this function every time a new rtn is executed
VOID Routine(RTN rtn, VOID *v)
{
if (!RTN_IsDynamic(rtn))
{
return;
}
*out << "Just discovered " << RTN_Name(rtn) << endl;
RTN_Open(rtn);
// Insert a call at the entry point of a routine to increment the call count
RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)RtnCallPrint, IARG_ADDRINT, RTN_Name(rtn).c_str(), IARG_END);
RTN_Close(rtn);
}
// Print out analysis results.
// This function is called when the application exits.
// @param[in] code exit code of the application
// @param[in] v value specified by the tool in the
// PIN_AddFiniFunction function call
VOID Fini(INT32 code, VOID *v)
{
const string fileName = KnobOutputFile.Value();
if (!fileName.empty())
{
delete out;
}
}
// The main procedure of the tool.
// This function is called when the application image is loaded but not yet started.
// @param[in] argc total number of elements in the argv array
// @param[in] argv array of command line arguments,
// including pin -t <toolname> -- ...
int main(int argc, char *argv[])
{
// Initialize symbol processing
// Initialize PIN library. Print help message if -h(elp) is specified
// in the command line or the command line is invalid
if(PIN_Init(argc,argv))
{
return Usage();
}
const string fileName = KnobOutputFile.Value();
if (!fileName.empty())
{
out = new std::ofstream(fileName.c_str());
}
// Register Routine to be called to instrument rtn
// Register function to be called when the application exits
PIN_AddFiniFunction(Fini, NULL);
// Start the program, never returns
return 0;
}



Callbacks


The examples in the previous section have introduced a number of ways to register callback functions via the Pin API, such as:

The extra parameter val (shared by all the registration functions) will be passed to fun as its second argument whenever it is "called back". This is a standard mechanism used in GUI programming with callbacks.

If this feature is not needed, it is safe to pass 0 for val when registering a callback. The expected use of val is to pass a pointer to an instance of a class. Since val is a generic pointer, fun must cast it back to an object before dereferencing the pointer.

Note that all callback registration functions return a PIN_CALLBACK object which can later be used to manipulate the properties of the registered callback (for example change the order in which PIN executes callback functions of the same type). This can be done by calling API functions that manipulates the PIN_CALLBACK object (see PIN callbacks)



Modifying Application Instructions


Although Pin is most commonly used for instrumenting applications, it is also possible to change the application's instructions. The simplest way to do this is to insert an analysis routine to emulate an instruction, and then use INS_Delete() to remove the original instruction. It is also possible to insert direct or indirect branches (using INS_InsertDirectJump and INS_InsertIndirectJump), which makes it easier to emulate instructions that change the control flow.

The memory addresses accessed by an instruction can be modified to refer to a value calculated by an analysis routine using INS_RewriteMemoryOperand.
For instructions whose memory operand has scattered access (vscatter/vgather), use INS_RewriteScatteredMemoryOperand.

Note that in all of the cases where an instruction is modified, the modification is only made after all of the instrumentation routines have been executed. Therefore all of the instrumentation routines see the original, un-modified instruction.



Instrumenting multi element instruction operands


Multi Element operands are operands of vector instructions and tile instructions, where the operand is a vector/matrix of elements and the instruction operation is performed on each element separately. For example, instructions from the SSE, AVX, AVX2, AVX512, AMX extensions, etc.
Pin supports the inspection and instrumentation of the operand elements.
For examples specific to AMX see Instrumenting AMX instructions

The following functions allow inspecting the static attributes of multi element operands:
INS_OperandElementSize
INS_OperandElementCount
INS_MemoryOperandElementSize
INS_MemoryOperandElementCount
INS_OperandHasElements

The following IARGs and interfaces allow inspecting static and runtime attributes of multi element operands:
IARG_MULTI_ELEMENT_OPERAND
IMULTI_ELEMENT_OPERAND

The code below demonstrates how to instrument memory operands and pass the effective address of the operand or operand elements to the analysis routine.

static VOID rtnMulti(IMULTI_ELEMENT_OPERAND* multiElemMemOp)
{
// We instrumented a memory operand
ASSERTX(multiElemMemOp->IsMemory())
for (UINT32 i = 0; i < multiElemMemOp->NumOfElements(); i++)
{
cout << "Element " << dec << i << " effective address " << hex << memOpEffectiveAddress << multiElemMemOp->ElementAddress(i) << endl;
}
}
static VOID rtnStandard(ADDRINT* memOpEffectiveAddress)
{
cout << "Operand effective address " << hex << memOpEffectiveAddress << endl;
}
// In instrumentation callback
...
// Verify this instruction can be used with IARG_MULTI_ELEMENT_OPERAND
{
for (UINT32 memOp=0; memOp < INS_MemoryOperandCount(ins); memOp++)
{
UINT32 op = INS_MemoryOperandIndexToOperandIndex(ins, memOp);
if (INS_OperandHasElements(ins, op))
{
INS_InsertCall( ins, IPOINT_BEFORE, (AFUNPTR)rtnMulti,
IARG_END);
}
else
{
INS_InsertCall( ins, IPOINT_BEFORE, (AFUNPTR)rtnStandard,
IARG_END);
}
}
}
Definition: types_vmapi.PH:952
virtual ADDRINT ElementAddress(UINT32 element_index) const =0
virtual BOOL IsMemory() const =0
@ IARG_MULTI_ELEMENT_OPERAND
Definition: types_vmapi.PH:330
UINT32 INS_MemoryOperandIndexToOperandIndex(INS ins, UINT32 memopIdx)
BOOL INS_OperandHasElements(INS ins, UINT32 opIdx)
BOOL INS_IsValidForIarg(INS ins, IARG_TYPE argType)

When to use IARG_MULTI_ELEMENT_OPERAND

The IMULTI_ELEMENT_OPERAND interface is applicable for all the vector instructions which operands have elements.
Some of the operand attributes covered by IMULTI_ELEMENT_OPERAND are known at instrumentation time, for example the number of elements and the size of an element.
The attributes that are only known during runtime are the effective addresses and mask values.
For some usages, IARG_MULTI_ELEMENT_OPERAND has alternatives which are discussed in sub-sections below.
Note that typically IARG_MULTI_ELEMENT_OPERAND would be slower than those alternatives.

Reading effective addresses

For reading effective addresses, IARG_MULTI_ELEMENT_OPERAND is recommended for instruction where the memory operand addresses non-contiguous memory
(where INS_HasScatteredMemoryAccess returns TRUE), for example vscatter/vgather.
The other option is calculating the addresses manually by passing the value of the index register, base, scale, etc.
For other vector instruction that don't fall into that category, the alternative to using IARG_MULTI_ELEMENT_OPERAND would be using IARG_MEMORYOP_EA and read the elements manually.
The code below demonstrates how to read effective addresses both ways.

static VOID printElements_1(IMULTI_ELEMENT_OPERAND* memOpInfo)
{
for (UINT32 i = 0; i < memOpInfo->NumOfElements(); i++)
{
cout << "Element " << dec << i << " ; size = " << memOpInfo->ElementSize(i);
if (memOpInfo->IsMemory())
{
cout << " ; address = " << hex << memOpInfo->ElementAddress(i) << endl;
}
}
}
static VOID printElements_2(ADDRINT addr, UINT32 elementCount, UINT32 elementSize)
{
for (UINT32 i=0; i<elementCount; i++)
{
UINT8* elementAddress = (UINT8*)addr + i*elementSize;
cout << "Element " << dec << i << " ; size = " << elementSize << " ; address = " << hex << (VOID*)elementAddress << endl;
}
}
// In instrumentation callback
...
// In this example we only instrument instructions that are good
// for both IARG_MULTI_ELEMENT_OPERAND and the alternative
{
for (UINT32 op=0; op < INS_OperandCount(ins); op++)
{
if (INS_OperandIsMemory(ins, op) && // Skip register operands
INS_OperandElementCount(ins, op) > 1) // Operand must have elements
{
// Instrument two analysis routines.
// Both will print the element addresses but will use different IARGs
INS_InsertCall( ins, IPOINT_BEFORE, (AFUNPTR)printElements_1,
IARG_END);
{
INS_InsertCall( ins, IPOINT_BEFORE, (AFUNPTR)printElements_2,
IARG_END);
}
}
}
}
virtual UINT32 NumOfElements() const =0
virtual USIZE ElementSize(UINT32 element_index) const =0
USIZE INS_OperandElementSize(INS ins, UINT32 opIdx)
UINT32 INS_OperandElementCount(INS ins, UINT32 opIdx)
UINT32 INS_OperandCount(INS ins)

Reading mask values

For reading mask values, an alternative to IARG_MULTI_ELEMENT_OPERAND would be using IARG_REG_CONST_REFERENCE and extract the mask values manually.
When extracted manually, the pintool must know where the mask bit is located in the mask register.

The code below demonstrates how to read mask values both ways.

static VOID printMask_1(UINT8* maskReg, UINT32 elementCount, UINT32 elementSize)
{
// For AVX2 - the mask bit is the high bit of the dword/qword N-th element in the bitmask array.
// AVX512 mask bits are extracted differently.
for (UINT32 i = 0; i < elementCount; i++)
{
BOOL maskSet = 0;
switch (elementSize)
{
case 4: maskSet = (((UINT32*)maskReg[i]) & 0x80000000) != 0; break;
case 8: maskSet = (((UINT64*)maskReg[i]) & 0x8000000000000000LL) != 0; break;
default: cerr << "Illegal element size" << endl;
}
cout << "Element " << dec << i << " ; mask = " << maskSet << endl;
}
}
static VOID printMask_2(IMULTI_ELEMENT_OPERAND* opInfo)
{
for (UINT32 i = 0; i < opInfo->NumOfElements(); i++)
{
cout << "Element " << dec << i << " ; mask = " << opInfo->ElementMaskValue(i) << endl;
}
}
// In instrumentation callback
...
REG maskReg = INS_MaskRegister(ins);
if (REG_valid(maskReg)) // This instruction uses a mask
{
for (UINT32 op=0; op < INS_OperandCount(ins); op++)
{
if (INS_OperandIsMemory(ins, op) && INS_OperandElementCount(ins, op) > 1)
{
// Instrument with IARG_MULTI_ELEMENT_OPERAND that also includes the mask
INS_InsertCall( ins, IPOINT_BEFORE, (AFUNPTR)printMask_2,
IARG_END);
// Instrument with IARG_REG_CONST_REFERENCE that will pass the full mask register value
INS_InsertCall( ins, IPOINT_BEFORE, (AFUNPTR)printMask_1,
IARG_END);
}
}
}
virtual UINT32 ElementMaskValue(UINT32 element_index) const =0
@ IARG_REG_CONST_REFERENCE
Definition: types_vmapi.PH:264
REG INS_MaskRegister(INS ins)
BOOL REG_valid(REG reg)
Definition: reg_ia32.PH:1837

Instrumenting AMX instructions


This section describes how to read the AMX state, tile configuration and how to instrument the AMX instruction operands, either Memory or TMM registers.

PIN_IsAmxActive returns the current AMX state.
Since instrumentation and analysis happen on different phases in the application flow, it is necessary to check the current AMX state in the analysis routine before analyzing the rest of the data in order to know whether this data is valid or not.

The following functions allow inspecting the dimensions of the matrix:
TileCfg_GetTileBytesPerRow
TileCfg_GetTileRows
These functions get a virtual register that reflects the tiles configuration ( REG_TILECONFIG ) and a TMM register for which the dimensions should be retrieved.
In order to use these functions in an analysis routine we must first inspect the instruction operands to identify the relevant TMM register, as shown in the example below.

AMX and Multi Elements

AMX tiles are multi element operands.
The difference between AMX tile operands and non-AMX multi element operands is that the number of elements is not known until after the LDTILECFG instruction executes, while for the non-AMX operands the number of elements is a static attribute of the instruction.
This means that APIs such as INS_OperandElementCount or INS_MemoryOperandElementCount will return 0 for AMX operands.
Reading a Memory tile content at analysis time requires using IARG_MULTI_ELEMENT_OPERAND that provides the IMULTI_ELEMENT_OPERAND interface through which the matrix cells addresses can be retrieved.
Reading a TMM register content at analysis time requires using both IARG_REG_REFERENCE / IARG_REG_CONST_REFERENCE that provide the full content of the tile,
and IARG_MULTI_ELEMENT_OPERAND that provides the IMULTI_ELEMENT_OPERAND interface through which the cells offsets within the tile can be retrieved.

Below is code example for the instrumentation callback where we configure the instrumentation.
In this example we instrument TILELOADD and TILESTORED and create an instrumentation that will allow us to read the runtime values of the memory matrix and the tile register matrix.

//
// Instrumentation callback
//
VOID Trace(TRACE trace, VOID* v)
{
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
{
for (INS ins = BBL_InsHead(bbl) ; INS_Valid(ins) ; ins = INS_Next(ins))
{
if (INS_IsAmx(ins))
{
xed_iclass_enum_t iclass = xed_decoded_inst_get_iclass(INS_XedDec(ins));
// This example is instrumenting TILELOADD and TILESTORED
if ((iclass == XED_ICLASS_TILELOADD) || (iclass == XED_ICLASS_TILESTORED))
{
// TILELOADD and TILESTORED have two operand - memory and TMM register.
// Find the index of each operand.
UINT32 opTMM = 0;
UINT32 opMemory = 0;
REG tmmReg = REG_INVALID();
BOOL foundTmmOperand = FALSE;
BOOL foundMemOperand = FALSE;
for (UINT32 i=0; i<INS_OperandCount(ins); i++)
{
if (INS_OperandIsMemory(ins, i))
{
opMemory = i;
foundMemOperand = TRUE;
}
else if (INS_OperandIsReg(ins, i))
{
REG opReg = INS_OperandReg(ins,i);
if (REG_is_tmm(opReg))
{
tmmReg = opReg;
opTMM = i;
foundTmmOperand = TRUE;
}
}
}
// Make sure we found valid memory and TMM operands
ASSERTX(foundTmmOperand && foundMemOperand);
ASSERTX(REG_valid(tmmReg));
// Make sure the operands are valid for multi element iarg
ASSERTX(INS_OperandHasElements(ins,opMemory));
ASSERTX(INS_OperandHasElements(ins,opTMM));
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)BeforeInstruction,
IARG_UINT32, tmmReg,
IARG_END);
}
}
}
}
}
xed_decoded_inst_t * INS_XedDec(INS ins)
BOOL INS_IsAmx(INS ins)
UINT32 REG_Size(REG reg)
BOOL REG_is_tmm(REG reg)
Definition: reg_ia32.PH:1416
REG REG_INVALID()
Definition: reg_ia32.PH:1832
@ REG_TILECONFIG
Definition: reg_ia32.PH:375

Below is code example for the analysis routine where we analyze the runtime values of the operands previously configured. In this example we print the cell values of the memory matrix and the tile register matrix.

//
// Analysis routine
//
static VOID BeforeInstruction(UINT8* tileCfgReg,
UINT32 tmmEnum,
UINT8* tmmReg,
UINT32 tmmRegSize,
{
if (PIN_IsAmxActive(PIN_ThreadId()) == FALSE)
{
return; // AMX is in init state for this thread - return
}
// Read element size (element size is the same for the memory and TMM operands)
ASSERTX(opInfoMem->NumOfElements() > 0);
UINT32 elementSize = opInfoMem->ElementSize(0);
// Read the number of rows and number of bytes per row from the Tile Config
UINT32 bytesPerRow = TileCfg_GetTileBytesPerRow(tileCfgReg, (REG)tmmEnum);
UINT32 rows = TileCfg_GetTileRows(tileCfgReg, (REG)tmmEnum);
UINT32 cols = bytesPerRow / elementSize;
cout << "Tile has " << dec << rows << " Rows * " << cols << " Columns ; total " << rows*cols << " cells" << endl;
// Make sure we don't exceed the number of elements in opInfoMem, opInfoReg
ASSERTX(opInfoMem->NumOfElements() == (rows*cols));
ASSERTX(opInfoReg->NumOfElements() == (rows*cols));
// Print memory matrix
cout << "Memory" << endl;
UINT32 i = 0;
for (UINT32 row=0; row < rows; row++)
{
for (UINT32 col=0; col < cols; col++)
{
// Get the address of the element
ADDRINT addr = opInfoMem->ElementAddress(i);
// Print the value of the element
if (elementSize == sizeof(UINT32)) // TILELOADD and TILESTORED have 4-byte elements, otherwise use different cast
{
UINT32 memCellValue = *(reinterpret_cast< UINT32* >(addr));
cout << dec << setw(5) << memCellValue << " ";
}
i++;
}
cout << endl;
}
// Print TMM matrix
cout << REG_StringShort((REG)tmmEnum) << endl;
i = 0;
for (UINT32 row=0; row < rows; row++)
{
for (UINT32 col=0; col < cols; col++)
{
// Get the offset in bytes within the tile to the element
UINT32 offset = opInfoReg->ElementOffset(i);
// Print the value of the element
if (elementSize == sizeof(UINT32)) // TILELOADD and TILESTORED have 4-byte elements, otherwise use different cast
{
// Make sure we do not exceed the register value area
ASSERTX( offset + sizeof(UINT32) <= tmmRegSize );
UINT32 tmmCellValue = *(UINT32*)(&(tmmReg[offset]));
cout << dec << setw(5) << tmmCellValue << " ";
}
i++;
}
cout << endl;
}
}
virtual UINT32 ElementOffset(UINT32 element_index) const =0
UINT32 TileCfg_GetTileBytesPerRow(UINT8 *tileCfgReg, REG tmm)
BOOL PIN_IsAmxActive(THREADID threadId)
UINT32 TileCfg_GetTileRows(UINT8 *tileCfgReg, REG tmm)
THREADID PIN_ThreadId()

Instrumenting IFUNC functions in Linux


GNU indirect function (IFUNC) is a feature that allows a developer to create multiple implementations of a given function and to select amongst them at runtime using a resolver function. It is mainly used in glibc. (e.g. memcpy/memset/strcpy)

Pin supports instrumentation on both IFUNC-resolver functions and their implementation/actual function.
Note: instrumentation on the ifunc function is the same as instrumentation on the resolver function and vice versa (since ifunc symbol value is the address of the resolver).

In order to instrument IFUNC function, PIN_InitSymbolsAlt(IFUNC_SYMBOLS) must be called in Pintool main function. Otherwise, IFUNC functions will not be visible in Pintool, only implementation functions (e.g. for memcmp: __memcmp_sse2, __memcmp_ssse3,... )

Usages in Pintool:

  • To check if a given rtn is an ifunc function use: Bool isResolver = SYM_IFuncResolver(RTN_Sym(rtn));
  • To get the implementation/actual function use: RTN impl = RTN_IFuncImplementation(rtn);
    See first example below demonstrating this usage.
  • RTN_FindByName() returns the implementation/actual function when IFUNC function name is passed as an argument.
    To get the resolver use: RTN resolver = RTN_IFuncResolver(rtn), where rtn is the implementation/actual function.
    See second example below demonstrating this usage.

The following example demonstrates instrumenting both IFUNC implementation and resolver using RTN_Name(), SYM_IFuncResolver() and RTN_IFuncImplementation():

VOID ImageLoad(IMG img, VOID* v)
{
for (SEC sec = IMG_SecHead(img); SEC_Valid(sec); sec = SEC_Next(sec))
{
for (RTN rtn = SEC_RtnHead(sec); RTN_Valid(rtn); rtn = RTN_Next(rtn))
{
if (RTN_Name(rtn).compare("memcmp") == 0)
{
if (!SYM_IFuncResolver(RTN_Sym(rtn))) continue;
cout << "Found " << RTN_Name(rtn).c_str() << " in " << IMG_Name(img);
RTN resolver = rtn;
RTN impl = RTN_IFuncImplementation(rtn);
cout << "... Replacing" << endl;
ASSERTX(RTN_Valid(resolver));
ASSERTX(RTN_Valid(impl));
// Instrumenting the implementation function
RTN_Open(impl);
RTN_InsertCall(impl, IPOINT_BEFORE, MAKE_AFUNPTR(BeforeMemcmp),..., IARG_END);
RTN_Close(impl);
// Instrumenting the resolver function, should be called once
RTN_Open(resolver);
RTN_InsertCall(resolver, IPOINT_BEFORE, MAKE_AFUNPTR(BeforeResolverFunction), IARG_PTR,..., IARG_END);
RTN_Close(resolver);
}
}
}
}
SYM RTN_Sym(RTN rtn)
Returns the SYM associated with the given routine.
RTN RTN_IFuncImplementation(RTN rtn)
Returns the implementation function that this ifunc points to.
BOOL SYM_IFuncResolver(SYM sym)
Checks if the symbol is an IFUNC resolver symbol.

The following example demonstrates instrumenting both IFUNC implementation and resolver using RTN_FindByName():

VOID ImageLoad(IMG img, VOID *v)
{
RTN rtn = RTN_FindByName(img, "memcmp");
if (RTN_Valid(rtn))
{
RTN_Open(rtn);
RTN_InsertCall(rtn, IPOINT_BEFORE, MAKE_AFUNPTR(BeforeMemcmp),..., IARG_END);
RTN_Close(rtn);
// Instrumenting the resolver function, should be called once
RTN resolver = RTN_IFuncResolver(rtn);
ASSERTX(RTN_Valid(resolver));
RTN_Open(resolver);
RTN_InsertCall(resolver, IPOINT_BEFORE, MAKE_AFUNPTR(BeforeResolverFunction),..., IARG_END);
RTN_Close(resolver);
} else {
cout << "No ifunc on this computer" << endl;
}
}
}
RTN RTN_IFuncResolver(RTN rtn)
Returns the resolver function that led to this implementation (ifunc).
BOOL SYM_IFuncImplementation(SYM sym)
Checks if the symbol is an IFUNC implementation symbol.

The Pin Advanced Debugging Extensions


Pin's advanced debugging extensions allow you to debug an application, even while it runs under Pin in JIT mode. Moreover, your Pintool can add support for new debugger commands, without making any changes to GDB, LLDB or Visual Studio. This allows you to interactively control your Pintool from within a live debugger session. Finally, Pintools can add powerful new debugger features that are enabled via instrumentation. For example, a Pintool can use instrumentation to look for an interesting condition (like a memory buffer overwrite) and then stop at a live debugger session when that condition occurs.

This section illustrates these three concepts:

  • Enabling all the traditional debugger features even while running an application under Pin in JIT mode.
  • Recognizing new debugger commands in your Pintool to allow interactive control of the tool from a live debugger session.
  • Adding support for new debugger features by writing a Pintool.

These features are available on Linux (using GDB) and Windows (using Visual Studio). The Pin APIs are the same in all cases, but their usage from within the debugger may differ because each debugger has a different UI. The following tutorial is divided into two sections: one that is Linux centric and another that is Windows centric. They both describe the same example, so you can continue by reading either section.

Finally, note that these advanced debugging extensions are not at all related to debugging your Pintool. If you have a bug in your tool and need to debug it, see the section Tips for Debugging a Pintool instead.

Advanced Debugging Extensions on Linux

Pin's debugging extensions on Linux work with nearly all modern versions of GDB/LLDB, so you can probably use whatever version of GDB/LLDB is already installed on your system. Pin uses GDB's remote debugger features, so it should work with any version of GDB/LLDB that supports that feature (Yes, LLDB support GDB's remote debugger features).

Throughout this section, we demonstrate the debugging extensions in Pin with the example tool "stack-debugger.cpp", which is available in the directory "source/tools/ManualExamples". You may want to compile that tool and follow along:

$ cd source/tools/ManualExamples
$ make DEBUG=1 stack-debugger.test

The tool and its associated test application, "fibonacci", are built in a directory named "obj-ia32", "obj-intel64", etc., depending on your machine type.

To enable the debugging extensions, run Pin with the -appdebug command line switch. This causes Pin to start the application and stop immediately before the first instruction. Pin then prints a message telling you to start debugger.

Linux:

$ ../../../pin -appdebug -t obj-intel64/stack-debugger.so -- obj-intel64/fibonacci.exe 1000
Application stopped until continued from debugger.
Start GDB, then issue this command at the prompt:
  target remote :33030

In another window, start the debugger and enter the command that Pin printed:

Linux:

$ gdb fibonacci
(gdb) target remote :33030

At this point, the debugger is attached to the application that is running under Pin. You can set breakpoints, continue execution, print out variables, disassemble code, etc.

Linux:

(gdb) break main
Breakpoint 1 at 0x401194: file fibonacci.cpp, line 12.
(gdb) cont
Continuing.

Breakpoint 1, main (argc=2, argv=0x7fbffff3c8) at fibonacci.cpp:12
12          if (argc > 2)
(gdb) print argc
$1 = 2
(gdb) x/4i $pc
0x401194 <main+27>:     cmpl   $0x2,0xfffffffffffffe5c(%rbp)
0x40119b <main+34>:     je     0x4011c8 <main+79>
0x40119d <main+36>:     mov    $0x402080,%esi
0x4011a2 <main+41>:     mov    $0x603300,%edi

Of course, any information you observe in the debugger shows the application's "pure" state. The details of Pin and the tool's instrumentation are hidden. For example, the disassembly you see above shows only the application's instructions, not any of the instructions inserted by the tool. However, when you use commands like "cont" or "step" to advance execution of the application, your tool's instrumentation runs as it normally would under Pin.

Note
After connecting the debugger, you should NOT use the "run" command. The application is already running and stopped at the first instruction. Instead, use the "cont" command to continue execution.

Adding New Debugger Commands

The previous section illustrated how you can enable the normal debugger features while running an application under Pin. Now, let's see how your Pintool can add new custom debugger commands, even without changing the debugger itself. Custom debugger commands are useful because they allow you to control your Pintool interactively from within a live debugger session. For example, you can ask your Pintool to print out information that it has collected, or you can interactively enable instrumentation only for certain phases of the application.

To illustrate, see the call to PIN_AddDebugInterpreter() in the stack-debugger tool. That API sets up the following callback function:

static BOOL DebugInterpreter(THREADID tid, CONTEXT *ctxt, const string &cmd, string *result, VOID *)
{
TINFO_MAP::iterator it = ThreadInfos.find(tid);
if (it == ThreadInfos.end())
return FALSE;
TINFO *tinfo = it->second;
std::string line = TrimWhitespace(cmd);
*result = "";
// [...]
if (line == "stats")
{
ADDRINT sp = PIN_GetContextReg(ctxt, REG_STACK_PTR);
tinfo->_os.str("");
if (sp <= tinfo->_stackBase)
tinfo->_os << "Current stack usage: " << std::dec << (tinfo->_stackBase - sp) << " bytes.\n";
else
tinfo->_os << "Current stack usage: -" << std::dec << (sp - tinfo->_stackBase) << " bytes.\n";
tinfo->_os << "Maximum stack usage: " << tinfo->_max << " bytes.\n";
*result = tinfo->_os.str();
return TRUE;
}
else if (line == "stacktrace on")
{
if (!EnableInstrumentation)
{
EnableInstrumentation = true;
*result = "Stack tracing enabled.\n";
}
return TRUE;
}
// [...]
return FALSE; // Unknown command
}
VOID PIN_RemoveInstrumentation()
@ REG_STACK_PTR
esp on a 32 bit machine, rsp on 64
Definition: reg_ia32.PH:55

The PIN_AddDebugInterpreter() API allows a Pintool to establish a handler for extended debugger commands. For example, the code snippet above implements the new commands "stats" and "stacktrace on". You can execute these commands in the debugger by using the "monitor" command:

Linux:

(gdb) monitor stats
Current stack usage: 688 bytes.
Maximum stack usage: 0 bytes.

A Pintool can do various things when the user types an extended debugger command. For example, the "stats" command prints out some information that the tool has collected. Any text that the tool writes to the "result" parameter is printed to the debugger console. Note that the CONTEXT parameter has the register state for the debugger's "focus" thread, so the tool can easily display information about this focus thread.

You can also use an extended debugger command to interactively enable or disable instrumentation in your Pintool, as demonstrated by the "stacktrace on" command. For example, if you wanted to quickly run your Pintool over the application's initial start-up phase, you could run with your Pintool's instrumentation disabled until a breakpoint is triggered. Then, you could use an extended command to enable instrumentation only during the interesting part of the application. In the stack-debugger example above, the call to PIN_RemoveInstrumentation() causes Pin to discard any previous instrumentation, so the tool re-instruments the code when the debugger continues execution of the application. As we will see later, the tool's global variable "EnableInstrumentation" adjusts the instrumentation that it inserts.

Semantic Breakpoints

The last major feature of the advanced debugging extensions is the ability to stop execution at a breakpoint by calling an API from your tool's analysis code. This may sound simple, but it is very powerful. Your Pintool can use instrumentation to look for a complex condition and then stop at a breakpoint when that condition occurs.

The "stack-debugger" tool illustrates this by using instrumentation to observe all the instructions that allocate stack space, and then it stops at a breakpoint whenever the application's stack usage reaches some threshold. In effect, this adds a new feature to the debugger that could not be practically implemented using traditional debugger technology because a traditional debugger can not reasonably find all the instructions that allocate stack space. A Pintool, however, can do this quite easily via instrumentation.

The example code below from the "stack-debugger" tool uses Pin instrumentation to identify all the instructions that allocate stack space.

static VOID Instruction(INS ins, VOID *)
{
if (!EnableInstrumentation)
return;
{
INS_InsertIfCall(ins, where, (AFUNPTR)OnStackChangeIf, IARG_REG_VALUE, REG_STACK_PTR,
IARG_REG_VALUE, RegTinfo, IARG_END);
INS_InsertThenCall(ins, where, (AFUNPTR)DoBreakpoint, IARG_CONST_CONTEXT, IARG_THREAD_ID, IARG_END);
}
}
IPOINT
Definition: types_vmapi.PH:132
@ IARG_REG_VALUE
Definition: types_vmapi.PH:233
@ IARG_CONST_CONTEXT
Definition: types_vmapi.PH:428
BOOL INS_RegWContain(const INS ins, const REG reg)
BOOL INS_IsValidForIpointAfter(INS ins)
VOID INS_InsertIfCall(INS ins, IPOINT action, AFUNPTR funptr,...)
VOID INS_InsertThenCall(INS ins, IPOINT action, AFUNPTR funptr,...)

The call to INS_RegWContain() tests whether an instruction modifies the stack pointer. If it does, we insert an analysis call immediately after the instruction, which checks to see if the application's stack usage exceeds a threshold.

Also notice that all the instrumentation is gated by the global flag "EnableInstrumentation", which we saw earlier in the "stacktrace on" command. Thus, the user can disable instrumentation (with "stacktrace off") in order to execute quickly through uninteresting parts of the application, and then re-enable it (with "stacktrace on") for the interesting parts.

The analysis routine OnStackChangeIf() returns TRUE if the application's stack usage has exceeded the threshold. When this happens, the tool calls the DoBreakpoint() analysis routine, which will stop at the debugger breakpoint. Notice that we use if / then instrumentation here because the call to DoBreakpoint() requires a "CONTEXT *" parameter, which can be slow.

static ADDRINT OnStackChangeIf(ADDRINT sp, ADDRINT addrInfo)
{
TINFO *tinfo = reinterpret_cast<TINFO *>(addrInfo);
// The stack pointer may go above the base slightly. (For example, the application's dynamic
// loader does this briefly during start-up.)
//
if (sp > tinfo->_stackBase)
return 0;
// Keep track of the maximum stack usage.
//
size_t size = tinfo->_stackBase - sp;
if (size > tinfo->_max)
tinfo->_max = size;
// See if we need to trigger a breakpoint.
//
if (BreakOnNewMax && size > tinfo->_maxReported)
return 1;
if (BreakOnSize && size >= BreakOnSize)
return 1;
return 0;
}
static VOID DoBreakpoint(const CONTEXT *ctxt, THREADID tid)
{
TINFO *tinfo = reinterpret_cast<TINFO *>(PIN_GetContextReg(ctxt, RegTinfo));
// Keep track of the maximum reported stack usage for "stackbreak newmax".
//
size_t size = tinfo->_stackBase - PIN_GetContextReg(ctxt, REG_STACK_PTR);
if (size > tinfo->_maxReported)
tinfo->_maxReported = size;
ConnectDebugger(); // Ask the user to connect a debugger, if it is not already connected.
// Construct a string that the debugger will print when it stops. If a debugger is
// not connected, no breakpoint is triggered and execution resumes immediately.
//
tinfo->_os.str("");
tinfo->_os << "Thread " << std::dec << tid << " uses " << size << " bytes of stack.";
PIN_ApplicationBreakpoint(ctxt, tid, FALSE, tinfo->_os.str());
}
VOID PIN_ApplicationBreakpoint(const CONTEXT *ctxt, THREADID tid, BOOL waitIfNoDebugger, const std::string &msg)

The analysis routine OnStackChangeIf() keeps track of some metrics on stack usage and tests whether the threshold has been reached. If the threshold is crossed, it returns non-zero, and Pin executes the DoBreakpoint() analysis routine.

The interesting part of DoBreakpoint() is at the very end, where it calls PIN_ApplicationBreakpoint(). This API causes Pin to stop the execution of all threads and triggers a breakpoint in the debugger. There is also a string parameter to PIN_ApplicationBreakpoint(), which the debugger prints at the console when the breakpoint triggers. A Pintool can use this string to tell the user why a breakpoint triggered. In our example tool, this string says something like "Thread 10 uses 4000 bytes of stack".

Please refer to the documentation of PIN_ApplicationBreakpoint() and read the note about avoiding an infinite loop of calls to the analysis function.

We can see the breakpoint feature in action in our example tool by using the "stackbreak 4000" command like this:

Linux:

(gdb) monitor stackbreak 4000
Will break when thread uses more than 4000 bytes of stack.
(gdb) c
Continuing.
Thread 0 uses 4000 bytes of stack.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000400e27 in Fibonacci (num=0) at fibonacci.cpp:34
(gdb)

When you are done, you can either continue the application and let it terminate, or you can quit from the debugger:

Linux:

(gdb) quit
The program is running.  Exit anyway? (y or n) y

Connecting the Debugger Later

In the previous example, we used the Pin switch -appdebug to stop the application and debug it from the first instruction. You can also enable Pin's debugging extensions without stopping at the first instruction. The following example shows how you can use the stack-debugger tool to start the application and attach with the debugger only after it triggers a stack limit breakpoint.

Linux:

$ ../../../pin -appdebug_enable -appdebug_silent -t obj-intel64/stack-debugger.so -stackbreak 4000 -- obj-intel64/fibonacci 1000

The -appdebug_enable switch tells Pin to enable application debugging without stopping at the first instruction. The -appdebug_silent switch disables the message that tells how to connect with the debugger. As we will see later, the Pintool can print a custom message instead. Finally, the "-stackbreak 4000" switch tells the stack-debugger tool to trigger a breakpoint when the stack grows to 4000 bytes. When the tool does trigger a breakpoint, it prints a message like this:

Linux:

Triggered stack-limit breakpoint.
Start GDB and enter this command:
  target remote :45462

You can now connect with the debugger as you did before, except now the debugger stops the application at the point where the stack-debugger tool triggered the stack-limit breakpoint.

Linux:

gdb fibonacci
(gdb) target remote :45462
0x0000000000400e27 in Fibonacci (num=0) at fibonacci.cpp:37
(gdb)

Let's look at the code in the tool that connects to the debugger now.

static void ConnectDebugger()
{
return;
return;
*Output << "Triggered stack-limit breakpoint.\n";
*Output << "Start GDB and enter this command:\n";
*Output << " target remote :" << std::dec << info._tcpServer._tcpPort << "\n";
*Output << std::flush;
if (PIN_WaitForDebuggerToConnect(1000*KnobTimeout.Value()))
return;
*Output << "No debugger attached after " << KnobTimeout.Value() << " seconds.\n";
*Output << "Resuming application without stopping.\n";
*Output << std::flush;
}
BOOL PIN_GetDebugConnectionInfo(DEBUG_CONNECTION_INFO *info)
DEBUG_STATUS PIN_GetDebugStatus()
BOOL PIN_WaitForDebuggerToConnect(unsigned timeout)
@ DEBUG_CONNECTION_TYPE_TCP_SERVER
Pin opens a TCP port and waits for a debugger to connect.
Definition: types_vmapi.PH:1080
@ DEBUG_STATUS_UNCONNECTED
Application debugging is enabled, but no debugger is connected yet.
Definition: types_vmapi.PH:1069
Definition: types_vmapi.PH:1139
DEBUG_CONNECTION_TYPE _type
Tells the type of debugger connection.
Definition: types_vmapi.PH:1140
int _tcpPort
TCP port that Pin listens on waiting for a debugger connection.
Definition: types_vmapi.PH:1132

The ConnectDebugger() function is called each time the tool wants to stop at a breakpoint. It first calls PIN_GetDebugStatus() to see if Pin is already connected to a debugger. If not, it uses PIN_GetDebugConnectionInfo() to get the TCP port number that is needed to connect the debugger to Pin. This is, for example, the "45462" number that the user types in the "target remote" command. After asking the user to start the debugger, the tool then calls PIN_WaitForDebuggerToConnect() to wait for the debugger to connect. If the user doesn't start the debugger after a timeout period, the tool prints a message and then continues executing the application.

As before, you can either continue the application and let it terminate, or you can quit from the debugger:

Linux:

(gdb) quit
The program is running.  Exit anyway? (y or n) y

Advanced Debugging Extensions on Windows

On Windows, the advanced debugging extensions work with Microsoft Visual Studio 2012 or greater. There is no support for earlier versions of Visual Studio, so make sure you have that version installed. Also, the Express edition of Visual Studio doesn't support IDE extensions, so it will not work with the Pin debugger extensions. Therefore, you must install the Professional edition (or greater). If you are a student, you may be able to get the Professional edition for free. Check the Microsoft web site or with your school's IT department for details.

After you have installed Visual Studio, you must also install the Pin extension for Visual Studio. Look for an installer named "pinadx-vsextension-X.Y.bat" in the root of the Pin kit. Run it as administrator.

The remainder of this section assumes that you are able to build the "stack-debugger" tool, so if you want to follow along, you must have the following software installed:

  • Visual Studio 2012, Professional edition (or greater).
  • The Pin debugger extension for Visual Studio 2012 or greater (pinadx-vsextension-X.Y.bat).

In order to start this tutorial, you will probably want to build the example tool "stack-debugger.cpp", which is available in the directory "source\tools\ManualExamples". To do this, open a Visual Studio command shell and type the following commands. (Use "TARGET=intel64" instead, if you want to build a 64-bit version of the tool.)

C:\> cd source\tools\ManualExamples
C:\> make TARGET=ia32 obj-ia32/stack-debugger.dll

After you have done this, start Visual Studio and open the sample solution file at "source\tools\ManualExamples\stack-debugger-tutorial.sln". Then build the sample application "fibonacci" by pressing F7. Make sure you can run the application natively by pressing CTRL-F5.

Now let's try running the "fibonacci" application under Pin with the "stack-debugger" tool. To do this, you must first set the "Pin Kit Directory" from TOOLS->Options->Pin Debugger.

Then you have to adjust the "fibonacci" project properties in Visual Studio: right-click on the "fibonacci" project in the Solution Explorer, choose Properties, and then click on Debugging. Change the drop-down titled "Debugger to launch" to "Pin Debugger" as shown in the figure below.

Then, set the "Pin Tool Path" property by browsing to the "stack-debugger.dll". Press OK when you are done.

Visual Studio is now configured to run the "fibonacci" application under your Pintool. However, before you continue, set a breakpoint in "main()" so that execution stops in the debugger. Then press F5 to start debugging.

You should now see a normal-looking debugger session, although your application is really running under control of Pin. All of the debugger features still work as you would expect. You can set breakpoints, continue execution, display the values of variables, and even view the disassembled code. All of the information that you observe in the debugger shows the application's "pure" state. The details of Pin and the tool's instrumentation are hidden. For example, the disassembly view shows only the application's instructions, not any of the instructions inserted by the tool. However, when you continue execution (e.g. with F5 or F10), the application executes along with your tool's instrumentation code.

Now, let's see an alternative way to debug the "fibonacci" application under Pin with the "stack-debugger" tool in Visual Studio. After you have built the "stack-debugger" tool, open a command shell and start the application with the debugging extensions enabled. This will cause Pin to stop immediately before the first instruction.

C:\> cd source\tools\ManualExamples
C:\> ..\..\..\pin -appdebug -t obj-ia32\stack-debugger.dll -- debug\fibonacci.exe 1000
Application stopped until continued from debugger.
Pin ready to accept debugger connection on port 30840

Open the source\tools\ManualExamples\fibonacci.cpp in Visual Studio and set a breakpoint to stop the execution in the debugger. To attach with Visual Studio to the process that is running under Pin, select "Attach to Pin Process" on the DEBUG menu. Select from the Available Processes table the "fibonacci" process, enter the port number that Pin printed and click Attach.

Adding New Debugger Commands

The previous section illustrated how you can enable the normal debugger features while running an application under Pin. Now, let's see how your Pintool can add new custom debugger commands, even without changing Visual Studio. Custom debugger commands are useful because they allow you to control your Pintool interactively from within a live debugger session. For example, you can ask your Pintool to print out information that it has collected, or you can interactively enable instrumentation only for certain phases of the application.

To illustrate, see the call to PIN_AddDebugInterpreter() in the stack-debugger tool. That API sets up the following callback function:

static BOOL DebugInterpreter(THREADID tid, CONTEXT *ctxt, const string &cmd, string *result, VOID *)
{
TINFO_MAP::iterator it = ThreadInfos.find(tid);
if (it == ThreadInfos.end())
return FALSE;
TINFO *tinfo = it->second;
std::string line = TrimWhitespace(cmd);
*result = "";
// [...]
if (line == "stats")
{
ADDRINT sp = PIN_GetContextReg(ctxt, REG_STACK_PTR);
tinfo->_os.str("");
if (sp <= tinfo->_stackBase)
tinfo->_os << "Current stack usage: " << std::dec << (tinfo->_stackBase - sp) << " bytes.\n";
else
tinfo->_os << "Current stack usage: -" << std::dec << (sp - tinfo->_stackBase) << " bytes.\n";
tinfo->_os << "Maximum stack usage: " << tinfo->_max << " bytes.\n";
*result = tinfo->_os.str();
return TRUE;
}
else if (line == "stacktrace on")
{
if (!EnableInstrumentation)
{
EnableInstrumentation = true;
*result = "Stack tracing enabled.\n";
}
return TRUE;
}
// [...]
return FALSE; // Unknown command
}

The PIN_AddDebugInterpreter() API allows a Pintool to establish a handler for extended debugger commands. For example, the code snippet above implements the new commands "stats" and "stacktrace on". You can execute these commands in Visual Studio by opening "DEBUG->Windows->Pin Console" in the IDE.

A Pintool can do various things when the user types an extended debugger command. For example, the "stats" command prints out some information that the tool has collected. Any text that the tool writes to the "result" parameter is printed to the Visual Studio Pin Console window. Note that the CONTEXT parameter has the register state for the debugger's "focus" thread, so the tool can easily display information about this focus thread.

You can also use an extended debugger command to interactively enable or disable instrumentation in your Pintool, as demonstrated by the "stacktrace on" command. For example, if you wanted to quickly run your Pintool over the application's initial start-up phase, you could run with your Pintool's instrumentation disabled until a breakpoint is triggered. Then, you could use an extended command to enable instrumentation only during the interesting part of the application. In the stack-debugger example above, the call to PIN_RemoveInstrumentation() causes Pin to discard any previous instrumentation, so the tool re-instruments the code when the debugger continues execution of the application. As we will see later, the tool's global variable "EnableInstrumentation" adjusts the instrumentation that it inserts.

Semantic Breakpoints

The last major feature of the advanced debugging extensions is the ability to stop execution at a breakpoint by calling an API from your tool's analysis code. This may sound simple, but it is very powerful. Your Pintool can use instrumentation to look for a complex condition and then stop at a breakpoint when that condition occurs.

The "stack-debugger" tool illustrates this by using instrumentation to observe all the instructions that allocate stack space, and then it stops at a breakpoint whenever the application's stack usage reaches some threshold. In effect, this adds a new feature to the debugger that could not be practically implemented using traditional debugger technology because a traditional debugger can not reasonably find all the instructions that allocate stack space. A Pintool, however, can do this quite easily via instrumentation.

The example code below from the "stack-debugger" tool uses Pin instrumentation to identify all the instructions that allocate stack space.

static VOID Instruction(INS ins, VOID *)
{
if (!EnableInstrumentation)
return;
{
INS_InsertIfCall(ins, where, (AFUNPTR)OnStackChangeIf, IARG_REG_VALUE, REG_STACK_PTR,
IARG_REG_VALUE, RegTinfo, IARG_END);
INS_InsertThenCall(ins, where, (AFUNPTR)DoBreakpoint, IARG_CONST_CONTEXT, IARG_THREAD_ID, IARG_END);
}
}

The call to INS_RegWContain() tests whether an instruction modifies the stack pointer. If it does, we insert an analysis call immediately after the instruction, which checks to see if the application's stack usage exceeds a threshold.

Also notice that all the instrumentation is gated by the global flag "EnableInstrumentation", which we saw earlier in the "stacktrace on" command. Thus, the user can disable instrumentation (with "stacktrace off") in order to execute quickly through uninteresting parts of the application, and then re-enable it (with "stacktrace on") for the interesting parts.

The analysis routine OnStackChangeIf() returns TRUE if the application's stack usage has exceeded the threshold. When this happens, the tool calls the DoBreakpoint() analysis routine, which will stop at the debugger breakpoint. Notice that we use if / then instrumentation here because the call to DoBreakpoint() requires a "CONTEXT *" parameter, which can be slow.

static ADDRINT OnStackChangeIf(ADDRINT sp, ADDRINT addrInfo)
{
TINFO *tinfo = reinterpret_cast<TINFO *>(addrInfo);
// The stack pointer may go above the base slightly. (For example, the application's dynamic
// loader does this briefly during start-up.)
//
if (sp > tinfo->_stackBase)
return 0;
// Keep track of the maximum stack usage.
//
size_t size = tinfo->_stackBase - sp;
if (size > tinfo->_max)
tinfo->_max = size;
// See if we need to trigger a breakpoint.
//
if (BreakOnNewMax && size > tinfo->_maxReported)
return 1;
if (BreakOnSize && size >= BreakOnSize)
return 1;
return 0;
}
static VOID DoBreakpoint(const CONTEXT *ctxt, THREADID tid)
{
TINFO *tinfo = reinterpret_cast<TINFO *>(PIN_GetContextReg(ctxt, RegTinfo));
// Keep track of the maximum reported stack usage for "stackbreak newmax".
//
size_t size = tinfo->_stackBase - PIN_GetContextReg(ctxt, REG_STACK_PTR);
if (size > tinfo->_maxReported)
tinfo->_maxReported = size;
ConnectDebugger(); // Ask the user to connect a debugger, if it is not already connected.
// Construct a string that the debugger will print when it stops. If a debugger is
// not connected, no breakpoint is triggered and execution resumes immediately.
//
tinfo->_os.str("");
tinfo->_os << "Thread " << std::dec << tid << " uses " << size << " bytes of stack.";
PIN_ApplicationBreakpoint(ctxt, tid, FALSE, tinfo->_os.str());
}

The analysis routine OnStackChangeIf() keeps track of some metrics on stack usage and tests whether the threshold has been reached. If the threshold is crossed, it returns non-zero, and Pin executes the DoBreakpoint() analysis routine.

The interesting part of DoBreakpoint() is at the very end, where it calls PIN_ApplicationBreakpoint(). This API causes Pin to stop the execution of all threads and triggers a breakpoint in the debugger. There is also a string parameter to PIN_ApplicationBreakpoint(), which is displayed in Visual Studio when the breakpoint triggers. A Pintool can use this string to tell the user why a breakpoint triggered. In our example tool, this string says something like "Thread 10 uses 4000 bytes of stack".

We can see the breakpoint feature in action in our example tool by typing this command in the Pin Console window:

>stackbreak 4000
Will break when thread uses more than 4000 bytes of stack.

Then press F5 to continue execution. The application should stop in the debugger again with a message like this:

When you are done, you can either continue the application with F5 or terminate it with SHIFT-F5.



Applying a Pintool to an Application


An application and a tool are invoked as follows:

pin [pin-option]... -t [toolname] [tool-options]... -- [application] [application-option]..

These are a few of the Pin options are currently available. See Command Line Switches for the complete list.

  • -t toolname: Specifies the Pintool to use. If you are running a 32-bit application in an IA-32 architecture, or a 64-bit application on an Intel(R) 64 architecture, only -t <toolname> is needed. If you are running an application on an Intel(R) 64 architecture, where all of the components in the chain are either 32-bit or 64-bit, but not both, only -t <toolname> is needed. If you are running an application on an Intel(R) 64 architecture, where components in the chain are both 32-bit and 64-bit, use -t64 <64-bit toolname> to specify the 64-bit tool binary followed by -t <32-bit toolname> to specify the 32-bit tool binary and the tool options. For more information, see Instrumenting Applications on Intel(R) 64 Architectures
  • -t64 toolname: Specify 64-bit tool binary for Intel(R) 64 architecture. If you are running an application on an Intel(R) 64 architecture, where components in the chain are both 32-bit and 64-bit, use -t64 together with -t as described above. See Instrumenting Applications on Intel(R) 64 Architectures.
    Important: Using -t64 without -t is not recommended, since in this case when given a 32-bit application, Pin will run the application without applying any tool.
  • -pause_tool n: is a useful Pin-option which prints out the process id and pauses Pin for n seconds to permit attaching with gdb. See Tips for Debugging a Pintool.
  • -follow_execv: Execute with Pin all processes spawned by execv class system-calls.

The tool-options follow immediately after the tool specification and depend on the tool used.

Everything following the is the command line for the application.

For example, to apply the itrace example (Instruction Address Trace (Instruction Instrumentation)) to a run of the "ls" program:

../../../pin -t obj-intel64/itrace.so -- /bin/ls

To get a listing of the available command line options for Pin:

pin -help

To get a listing of the available command line options for the itrace example:

../../../pin -t obj-intel64/itrace.so -help -- /bin/ls

Note that in the last case /bin/ls is necessary on the command line but will not be executed.

Instrumenting Applications on Intel(R) 64 Architectures

The Pin kit for IA-32 and Intel(R) 64 architectures is a combined kit. Both a 32-bit version and a 64-bit version of Pin are present in the kit. This allows Pin to instrument complex applications on Intel(R) 64 architectures which may have 32-bit and 64-bit components.

An application and a tool are invoked in "mixed-mode" as follows:

pin [pin-option]... -t64 <64-bit toolname> -t <32-bit toolname> [tool-options]...
-- <application> [application-option]..

Please note:

  • The -t64 option must precede the -t option.
  • When using -t64 together with -t, -t specifies the 32-bit tool. Using -t64 without -t is not recommended, since in this case when given a 32-bit application, Pin will run the application without applying any tool.
  • The [tool-options] apply to both the 64-bit and the 32-bit tools and must be specified after -t <32-bit toolname>. It is not possible to specify different set of options for the 64-bit and the 32-bit tools.

See source/tools/CrossIa32Intel64/makefile for a few examples.

The file "pin" is a launcher executable that is used to prepare the correct environment for Pin's instrumentation engine, start pind, the Pin's server (see Executing Remote Procedures), and begin the instrumentation by instructing Pin to either inject to an existing process or launch a new process. By default the launcher assumes a certain layout of the pinkit installation. It is possible to instruct pin to assume different layouts. For more information about the knobs available to modify the launcher's behavior issue pin -help from the command-line. Pin provides two launchers, "pin" and "pin32". The default "pin" launcher is a 64-bit binary, "pin32" is a 32 bit binary. Both binaries can be used to launch instrumentation for applications of any architecture, we provide "pin32" to support native 32-bit systems.

Note
Users should not try to invoke Pin's instrumentation engine directly but rather always use the launcher.

IMPORTANT: The description about invoking assumes that the application is a program binary (and not a shell script). If your application is invoked indirectly (from a shell script or using 'exec') then you need to change the actual invocation of the program binary by prefixing it with Pin/Pintool options. Here's one way of doing that:

 # Track down the actual application binary, say it is 'application_binary'.
 % mv application_binary application_binary.real

 # Write a shell script named 'application_binary' with the following contents.
 # (change 'itrace' to your desired tool)

 #!/bin/sh
 ../../../pin -t obj-intel64/itrace.so -- application_binary.real $*

After you do this, whenever 'application_binary' is invoked indirectly (from some shell script or using 'exec'), the real binary will get invoked with the right Pin/Pintool options.

Restrictions

There is a known problem of using Pin on systems protected by the "McAfee Host Intrusion Prevention"* antivirus software. We did not test coexistence of Pin with other antivirus products that perform run-time execution monitoring.

There is a known limitation of using Pin on Linux systems that prevent the use of ptrace attach via the sysctl /proc/sys/kernel/yama/ptrace_scope. Pin will still work when launching applications with the pin command line. However, Pin will fail in attach mode (that is, using the -pid knob). To resolve this, do the following (as root):

$ echo 0 > /proc/sys/kernel/yama/ptrace_scope



Tips for Debugging a Pintool


Using gdb on Linux

When running an application under the control of Pin and a Pintool there are two different programs residing in the address space. The application, and the Pin instrumentation engine together with your Pintool. The Pintool is normally a shared object loaded by Pin. This section describes how to use gdb to find bugs in a Pintool. You cannot run Pin directly from gdb since Pin uses the debugging API to start the application. Instead, you must invoke Pin from the command line with the -pause_tool switch, and use gdb to attach to the Pin process from another window. The -pause_tool n switch makes Pin print out the process identifier (pid) and pause for n seconds.

Pin searches for the tool in an internal search algorithm. Therefore in many cases gdb is unable to load the debug info for the tool. There are several options to help gdb find the debug info.

 Option 1 is to use full path to the tool when running pin.

 Option 2 is to tell gdb to load the debugging information of the tool.
 Pin prompts with the exact gdb command to be used in this case.

To check that gdb loaded the debugging info to the tool use the command "info sharedlibrary" and you should see that gdb has read the symbols for your tool (as in the example below).

(gdb) info sharedlibrary
From        To          Syms Read   Shared Object Library
0x001b3ea0  0x001b4d80  Yes         /lib/libdl.so.2
0x003b3820  0x00431d74  Yes         /usr/intel/pkgs/gcc/4.2.0/lib/libstdc++.so.6
0x0084f4f0  0x00866f8c  Yes         /lib/i686/libm.so.6
0x00df8760  0x00dffcc4  Yes         /usr/intel/pkgs/gcc/4.2.0/lib/libgcc_s.so.1
0x00e5fa00  0x00f60398  Yes         /lib/i686/libc.so.6
0x40001c50  0x4001367f  Yes         /lib/ld-linux.so.2
0x008977f0  0x00af7784  Yes         ./dcache.so
For example, if your tool is called opcodemix and the application is /bin/ls,
you can use gdb as described below. The following example is for the Intel(R) 64 Linux platform.
Substitute "ia32" for the IA-32 architecture.

Change directory to the directory where your
tool resides, and start gdb with pin, but do not use the run command.
$ /usr/bin/gdb ../../../intel64/bin/pinbin
GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1"
(gdb)

In another window, start your application with the -pause_tool switch.

$ ../../../pin -pause_tool 10 -t obj-intel64/opcodemix.so -- /bin/ls
Pausing for 10 seconds to attach to process with pid 28769
To load the tool's debug info to gdb use:
   add-symbol-file .../source/tools/SimpleExamples/obj-intel64/opcodemix.so 0x2a959e9830

Then go back to gdb and attach to the process.

(gdb) attach 28769
Attaching to program: .../intel64/bin/pinbin, process 28769
0x000000314b38f7a2 in ?? ()
(gdb)

Now, you should tell gdb to load the Pintool debugging information, by copying the debugging message we got when invoking pin with the -pause_tool switch..

(gdb) add-symbol-file .../source/tools/SimpleExamples/obj-intel64/opcodemix.so 0x2a959e9830
add symbol table from file ".../source/tools/SimpleExamples/obj-intel64/opcodemix.so" at
        .text_addr = 0x2a959e9830
        (y or n) y
        Reading symbols from .../source/tools/SimpleExamples/obj-intel64/opcodemix.so...done.
(gdb)

Now, instead of using the gdb run command, you use the cont command to continue execution. You can also set breakpoints as normal.

(gdb) b opcodemix.cpp:447
Breakpoint 1 at 0x2a959ecf60: file opcodemix.cpp, line 447.
(gdb) cont
Continuing.

Breakpoint 1, main (argc=7, argv=0x3ff00f12f8) at opcodemix.cpp:447
447     int main(int argc, CHAR *argv[])
(gdb)

If the program does not exit, then you should detach so gdb will release control.

(gdb) detach
Detaching from program: .../intel64/bin/pinbin, process 28769
(gdb)

If you recompile your program and then use the run command, gdb will notice that the binary has been changed and reread the debug information from the file. This does not always happen automatically when using attach. In this case you must use the "add-symbol-file" command again to make gdb reread the debug information.

Using the Visual Studio Debugger on Windows

When running an application under the control of Pin and a Pintool there are two different programs residing in the address space. The application, and the Pin instrumentation engine together with your Pintool. The Pintool is a dynamically loaded library (.dll) loaded by Pin. This section describes how to use the Visual Studio Debugger to find bugs in a Pintool. You cannot run Pin directly from the debugger since Pin uses the debugging API to start the application. Instead, you must invoke Pin from the command line with the -pause_tool switch, and use Visual Studio to attach to the Pin process from another window. The -pause_tool n switch makes Pin print out the process identifier (pid) and pause for n seconds. You have n seconds (20 in our example) to attach the application with the debugger. Note, application resumes once the timeout expires. Attaching debugger later will not have the desired effect.

 % pin <pin options> -pause_tool 20 -t <tool name>  <tool options> -- <app name> <app options>
Pausing for 20 seconds to attach to process with pid 28769

In the Visual Studio window, attach to the application process using the "Debug"->"Attach to Process" menu selection and wait until a breakpoint occurs. Then you can set breakpoints in your tool in the usual way.

Note, it is necessary to build your Pintool with debug symbols if you want symbolic information.

Using the WinDbg Debugger on Windows

WinDbg Debugger is the only available option to debug Pintool when it is necessary to attach to an instrumented process after Pin initialization. It also could be used instead of Visual Studio Debugger in scenario described above. The debugger is available at https://msdn.microsoft.com/en-us/library/windows/hardware/ff551063(v=vs.85).aspx

The following steps are necessary to properly debug Pintool in instrumented process:

  - Install latest WinDbg and Process Explorer utility
    ( https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx )
  - Add Microsoft Symbol Server settings in WinDbg: in "File" -> "Symbol File Path"
    type <b> srv*c:\\symbols*http://msdl.microsoft.com/download/symbols </b>.
    Create c:\\symbols directory that will serve as local repository for OS DLLs symbols.
  - Attach WinDbg to an instrumented process. Architectures of WinDbg and the process should match.
  - Use Process Explorer to notice location of hidden DLLs (Pintool DLL, its dependencies and pinvm.dll).
    Select process of interest in Process View, type <em>Ctrl-D</em> , then double-click
    on each hidden DLL of interest in DLL View to get location info.
  - When Windbg stops after attach, enter the following command for each hidden DLL:
.reload /f <name>=<address>,<size>

where <name> is DLL base name, <address> is its actual base address and <size> is its actual size in memory. Example:

.reload /f mytool.dll=0x50200000,0x420000
  • From now on you can set breakpoints using symbolic info of the DLLs and see comprehensive call stacks.



Logging Messages from a Pintool


Pin provides a mechanism to write messages from a Pintool to a logfile. To use this capability, call the LOG() API with your message. The default filename is pintool.log, and it is created in the currently working directory. Use the -logfile switch after the tool name to change the path and file name of the log file.

LOG( "Replacing function in " + IMG_Name(img) + "\n" );
LOG( "Address = " + hexstr( RTN_Address(rtn)) + "\n" );
LOG( "Image ID = " + decstr( IMG_Id(img) ) + "\n" );
std::string hexstr(INT64 val, UINT32 width=0)
Definition: util.PH:154



Performance Considerations When Writing a Pintool


The way a Pintool is written can have great impact on the performace of the tool, i.e. how much it slows down the applications it is instrumenting. This section demonstrates some techniques that can be used to improve tool performance. Let's start with an example. The following piece of code is derived from the source/tools/SimpleExamples/edgcnt.cpp:

The instrumentation component of the tool is show below

VOID Instruction(INS ins, void *v)
{
...
if ( [ins is a branch or a call instruction] )
{
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount2,
IARG_END);
}
...
}
@ IARG_BRANCH_TAKEN
Type: BOOL. Non zero if a branch is taken. Argument is invalid for XBEGIN and XEND instructions.
Definition: types_vmapi.PH:341
@ IARG_BRANCH_TARGET_ADDR
Definition: types_vmapi.PH:349

The analysis component looks like this:

VOID docount2( ADDRINT src, ADDRINT dst, INT32 taken )
{
if(!taken) return;
COUNTER *pedg = Lookup( src,dst );
pedg->_count++;
}

The purpose of the tool is to count how often each controlflow changing edge in the control flowgraph is traversed. The tool considers both calls and branches but for brevity we will not mention branches in our description. The tool works as follows: The instrumentation component instruments each branch with a call to docount2. As parameters we pass in the origin and the target of the branch and whether the branch was taken or not. Branch origin and target represent of the source and destination of the controlflow edges. If a branch is not taken the controlflow does not change and hence the analysis routine returns right away. If the branch is taken we use the src and dst parameters to look up the counter associated with this edge (Lookup will create a new one if this edge has not been seen before) and increment the counter. Note, that the tool could have been simplified somewhat by using IPOINT_TAKEN_BRANCH option with INS_InsertCall().

Shifting Computation from Analysis to Instrumentation Code

About every 5th instruction executed in a typical application is a branch. Lookup will called whenever these instruction are executed, causing significant application slowdown. To improve the situation we note that the instrumentation code is typically called only once for every instruction, while the analysis code is called everytime the instruction is executed. If we can somehow shift computation from the analysis code to the instrumentation code we will improve the overall performance. Our example tools offer multiple such opportunites which will explore in turn. The first observation is that for most branches we can find out inside of Instruction() what the branch target will be . For those branches we can call Lookup inside of Instruction() rather than in docount2(), for indirect branches which are relatively rare we still have to use our original approach. All this is reflected in the folling code. We add a second "lighter" analysis function, docount. While the original docount2() remains unchanged:

VOID docount( COUNTER *pedg, INT32 taken )
{
if( !taken ) return;
pedg->_count++;
}

And the instrumentation will be somewhat more complex:

VOID Instruction(INS ins, void *v)
{
...
{
COUNTER *pedg = Lookup( INS_Address(ins), INS_DirectControlFlowTargetAddress(ins) );
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount,
IARG_ADDRINT, pedg,
IARG_END);
}
else
{
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount2,
IARG_END);
}
...
}
ADDRINT INS_DirectControlFlowTargetAddress(INS ins)
Returns the target address for direct control-flow instructions.
BOOL INS_IsDirectControlFlow(INS ins)
ADDRINT INS_Address(INS ins)
Returns the address of the instruction.

Eliminating Control Flow

The code for docount() is very compact which provides performance advantages; it may also allow it to be inlined by Pin, thereby avoiding the overhead of a call. The heuristics for when a analysis routine is inlined by Pin are subject to change. But small routines without any control flow (single basic block) are almost guaranteed to be inlined. Unfortunately, docount() does have (albeit limited) control flow. Observing that the parameter, 'taken', will be zero or one we can eliminate the remaining control flow as follows:

VOID docount( COUNTER *pedg, INT32 taken )
{
pedg->_count += taken;
}

Now docount() can be inlined.

Compiler Considerations for Inlining

The way that the tool is built affects inlining as well. If an analysis routine has a function call to another function, it would not be a candidate for inlining by Pin unless the function call was inlined by the compiler. If the function call is inlined by the compiler, the analysis routine would be a candidate for inlining by Pin. Therefore, it is advisable to write any subroutines called by the analysis routine in a way that allows the compiler to inline the subroutines.

Pintools are built using Position Independent Code (PIC) so compiler will not inline any globally visible function due to function preemption. Therefore, it is advisable to declare the subroutines called by the analysis function as 'static' on Linux.

Letting Pin Decide Where to Instrument

At times we do not care about the exact point where calls to analysis code are being inserted as long as it is within a given basic block. In this case we can let Pin make the decission where to insert. This has the advantage that Pin can select am insertion point that requires minimal register saving and restoring. The following code from ManualExamples/inscount2.cpp shows how this is done for the instruction count example using IPOINT_ANYWHERE with BBL_InsertCall().

/*
* Copyright (C) 2004-2025 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <iostream>
#include <fstream>
#include "pin.H"
std::ofstream OutFile;
// The running count of instructions is kept here
// make it static to help the compiler optimize docount
static UINT64 icount = 0;
// This function is called before every block
// Use the fast linkage for calls
VOID PIN_FAST_ANALYSIS_CALL docount(ADDRINT c) { icount += c; }
// Pin calls this function every time a new basic block is encountered
// It inserts a call to docount
VOID Trace(TRACE trace, VOID* v)
{
// Visit every basic block in the trace
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
{
// Insert a call to docount for every bbl, passing the number of instructions.
// IPOINT_ANYWHERE allows Pin to schedule the call anywhere in the bbl to obtain best performance.
// Use a fast linkage for the call.
IARG_END);
}
}
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "inscount.out", "specify output file name");
// This function is called when the application exits
VOID Fini(INT32 code, VOID* v)
{
// Write to a file since std::cout and std::cerr maybe closed by the application
OutFile.setf(std::ios::showbase);
OutFile << "Count " << icount << std::endl;
OutFile.close();
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
std::cerr << "This tool counts the number of dynamic instructions executed" << std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
OutFile.open(KnobOutputFile.Value().c_str());
// Register Instruction to be called to instrument instructions
// Register Fini to be called when the application exits
// Start the program, never returns
return 0;
}

Using Fast Call Linkages

For very small analysis functions, the overhead to call the function can be comparable to the work done in the function. Some compilers offer optimized call linkages that eliminate some of the overhead. For example, gcc for the IA-32 architecture has a regparm attribute for passing arguments in registers. Pin supports a limited number of alternate linkages. To use it, you must annotate the declaration of the analysis function with PIN_FAST_ANALYSIS_CALL. The InsertCall function must pass IARG_FAST_ANALYSIS_CALL. If you change one without changing the other, the arguments will not be passed correctly. See the inscount2.cpp example in the previous section for a sample use. For large analysis functions, the benefit may not be significant, but it is unlikely that PIN_FAST_ANALYSIS_CALL would ever cause a slowdown.

Another call linkage optimization is to eliminate the frame pointer. We recommend using -fomit-frame-pointer to compile tools with gcc. See the gcc documentation for an explanation of what it does. The standard Pintool makefiles include -fomit-frame-pointer. Like PIN_FAST_ANALYSIS_CALL, the benefit is largest for small analysis functions. Debuggers rely on frame pointers to display stack traces, so eliminate this option when trying to debug a PinTool. If you are using a standard PinTool makefile, you can do this by overriding the definition of OPT on the command line with

make OPT=-O0

Rewriting Conditional Analysis Code to Help Pin Inline

Pin improves instrumentation performance by automatically inlining analysis routines that have no control-flow changes. Of course, many analysis routines do have control-flow changes. One particularly common case is that an analysis routine has a single "if-then" test, where a small amount of analysis code plus the test is always executed but the "then" part is executed only once a while. To inline this common case, Pin provides a set of conditional instrumentation APIs for the tool writer to rewrite their analysis routines into a form that does not have control-flow changes. The following example from source/tools/ManualExamples/isampling.cpp illustrates how such rewriting can be done:

/*
* Copyright (C) 2005-2021 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
/*
* This file contains a Pintool for sampling the IPs of instruction executed.
* It serves as an example of a more efficient way to write analysis routines
* that include conditional tests.
* Currently, it works on IA-32 and Intel(R) 64 architectures.
*/
#include <stdio.h>
#include <stdlib.h>
#include "pin.H"
FILE* trace;
const INT32 N = 100000;
const INT32 M = 50000;
INT32 icount = N;
/*
* IP-sampling could be done in a single analysis routine like:
*
* VOID IpSample(VOID *ip)
* {
* --icount;
* if (icount == 0)
* {
* fprintf(trace, "%p\n", ip);
* icount = N + rand() % M;
* }
* }
*
* However, we break IpSample() into two analysis routines,
* CountDown() and PrintIp(), to facilitate Pin inlining CountDown()
* (which is the much more frequently executed one than PrintIp()).
*/
ADDRINT CountDown()
{
--icount;
return (icount == 0);
}
// The IP of the current instruction will be printed and
// the icount will be reset to a random number between N and N+M.
VOID PrintIp(VOID* ip)
{
fprintf(trace, "%p\n", ip);
// Prepare for next period
icount = N + rand() % M; // random number from N to N+M
}
// Pin calls this function every time a new instruction is encountered
VOID Instruction(INS ins, VOID* v)
{
// CountDown() is called for every instruction executed
INS_InsertIfCall(ins, IPOINT_BEFORE, (AFUNPTR)CountDown, IARG_END);
// PrintIp() is called only when the last CountDown() returns a non-zero value.
INS_InsertThenCall(ins, IPOINT_BEFORE, (AFUNPTR)PrintIp, IARG_INST_PTR, IARG_END);
}
// This function is called when the application exits
VOID Fini(INT32 code, VOID* v)
{
fprintf(trace, "#eof\n");
fclose(trace);
}
/* ===================================================================== */
/* Print Help Message */
/* ===================================================================== */
INT32 Usage()
{
PIN_ERROR("This Pintool samples the IPs of instruction executed\n" + KNOB_BASE::StringKnobSummary() + "\n");
return -1;
}
/* ===================================================================== */
/* Main */
/* ===================================================================== */
int main(int argc, char* argv[])
{
trace = fopen("isampling.out", "w");
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
// Register Instruction to be called to instrument instructions
INS_AddInstrumentFunction(Instruction, 0);
// Register Fini to be called when the application exits
// Start the program, never returns
return 0;
}

In the above example, the original analysis routine IpSample() has a conditional control-flow change. It is rewritten into two analysis routines: CountDown() and PrintIp(). CountDown() is the simpler one of the two, which doesn't have control-flow change. It also performs the original conditional test and returns the test result. We use the conditional instrumentaton APIs INS_InsertIfCall() and INS_InsertThenCall() to tell Pin that tbe analysis routine specified by an INS_InsertThenCall() (i.e. PrintIp() in this example) is executed only if the result of the analysis routine specified by the previous INS_InsertIfCall() (i.e. CountDown() in this example) is non-zero. Now CountDown(), the common case, can be inlined by Pin, and only once a while does Pin need to execute PrintIp(), the non-inlined case.

Optimizing Instrumentation of REP Prefixed Instructions

The IA-32 and Intel(R) 64 architectures include REP prefixed string instructions. These use a REP prefix on a string operation to repeat the execution of the inner operation. For some instructions the repeat count is determined solely by the value in the count register. For others (SCAS,CMPS), the count register provides an upper limit on the number of iterations, while the REP opcode provides a condition to be tested which can exit the REP loop before the full number of iterations has been executed.

Pin treats REP prefixed instructions as an implicit loop around the inner instruction, so IPOINT_BEFORE and IPOINT_AFTER instrumentation is executed for that instruction once for each iteration of the (implicit) loop. Since each execution of the inner instruction is instrumented, IARG_MEMORY{READ,READ2,WRITE}_SIZE can be determined statically from the instruction (1,2,4,8 bytes), and IARG_MEMORY{OP,READ,READ2,WRITE}_EA can also be determined (even if DF==1, so the inner instructions are decrementing their arguments and moving backwards through store).

REP prefixed instructions are treated as predicated, where the predicate is that the count register is non-zero. Therefore canonical instrumentation for memory accesses such as

if (INS_MemoryOperandIsRead(ins,memOp))
{
INS_InsertPredicatedCall(ins, IPOINT_BEFORE,(AFUNPTR)logMemory,
IARG_END);
}

will see all of the memory accesses made by the REP prefixed operations.

To allow tools to count entries into a REP prefixed instruction, and to optimize, Pin provides IARG_FIRST_REP_ITERATION, which can be passed as an argument to an analysis routine. It is TRUE if this is the first iteration of a REP prefixed instruction, FALSE otherwise.

Thus to perform an action only on the first iteration of a REP prefixed instruction, one can use code like this (assuming that "takeAction" wants to be called on the first iteration of all REP prefixed instructions, even ones with a zero repeat count):

To obtain the repeat count, you can use

IARG_REGISTER_VALUE, INS_RepCountRegister(ins),
REG INS_RepCountRegister(INS ins)

which will pass the value in the appropriate count register (one of REG_CX,REG_ECX,REG_RCX depending on the instruction).

As an example, here is code which counts the number of times REP prefixed instructions are executed, optimizing cases in which the REP prefixed instruction only depends on the count register.

class stats
{
UINT64 count; // Times we start the REP prefixed op
UINT64 repeatedCount; // Times we execute the inner instruction
UINT64 zeroLength; // Times we start but don't execute the inner instruction because count is zero
public:
stats() : count(0), repeatedCount(0), zeroLength(0) {}
VOID output() const;
VOID add(UINT32 firstRep, UINT32 repCount)
{
count += firstRep;
repeatedCount += repCount;
if (repCount == 0)
zeroLength += 1;
}
BOOL empty() const { return count == 0; }
stats& operator+= (const stats &other)
{
count += other.count;
repeatedCount += other.repeatedCount;
zeroLength += other.zeroLength;
return *this;
}
};
// Trivial analysis routine to pass its argument back in an IfCall so that we can use it
// to control the next piece of instrumentation.
static ADDRINT returnArg (BOOL arg)
{
return arg;
}
// Analysis functions for execution counts.
// Analysis routine, FirstRep and Executing tell us the properties of the execution.
static VOID addCount (UINT32 opIdx, UINT32 firstRep, UINT32 repCount)
{
stats * s = &statistics[opIdx];
s->add(firstRep, repCount);
}
// Instrumentation routines.
// Insert code for counting how many times the instruction is executed
static VOID insertRepExecutionCountInstrumentation (INS ins, UINT32 opIdx)
{
if (takesConditionalRep(opIdx))
{
// We have no smart way to lessen the number of
// instrumentation calls because we can't determine when
// the conditional instruction will finish. So we just
// let the instruction execute and have our
// instrumentation be called on each iteration. This is
// the simplest way of handling REP prefixed instructions, where
// each iteration appears as a separate instruction, and
// is independently instrumented.
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)addCount,
IARG_UINT32, opIdx,
IARG_END);
}
else
{
// The number of iterations is determined solely by the count register value,
// therefore we can log all we need at the start of each REP "loop", and skip the
// instrumentation on all the other iterations of the REP prefixed operation. Simply use
// IF/THEN instrumentation which tests IARG_FIRST_REP_ITERATION.
//
INS_InsertIfCall(ins, IPOINT_BEFORE, (AFUNPTR)returnArg, IARG_FIRST_REP_ITERATION, IARG_END);
INS_InsertThenCall(ins, IPOINT_BEFORE, (AFUNPTR)addCount,
IARG_UINT32, opIdx,
IARG_END);
}
}
@ IARG_FIRST_REP_ITERATION
Definition: types_vmapi.PH:360
@ IARG_EXECUTING
Type: BOOL. False if the instruction will not be executed because of predication, otherwise true.
Definition: types_vmapi.PH:355

To perform this optimization when collecting memory access addresses, you will also need to worry about the state of EFLAGS.DF, since the string operations work from high address to low address when EFLAGS.DF==1.
(Note: REG_EFLAGS enum represents eflags register, used on 32-bit systems only. For 64-bit systems use REG_RFLAGS enum, or REG_GFLAGS enum, which represents either rflags or eflags register depending on the system architecture)

Here is an example which shows how to handle that.

// Compute the base address of the whole access given the initial address,
// repeat count and element size. It has to adjust for DF if it is asserted.
static ADDRINT computeEA (ADDRINT firstEA, UINT32 eflags, UINT32 count, UINT32 elementSize)
{
enum {
DF_MASK = 0x0400
};
if (eflags & DF_MASK)
{
ADDRINT size = elementSize*count;
// The string ops post-decrement, so the lowest address is one elementSize above
// where you might think it should be.
return firstEA - size + elementSize;
}
else
return firstEA;
}
static VOID logMemoryAddress (UINT32 op, // Index of instruction
BOOL first, // First iteration?
ADDRINT baseEA, // Effective address being accessed on this iteration
ADDRINT count, // Iteration count
UINT32 size, // Size in bytes of the per-iteration access
UINT32 eflags, // Eflags
ADDRINT tag) // Name for the type of access
{
const char * tagString = reinterpret_cast<const char *>(tag);
UINT32 width = 20;
if (!first)
{
out << " "; // Indent REP iterations
width -= 2;
}
out << opcodes[op].name << ' ' << tagString << ' ';
out << std::hex << std::setw(width) << computeEA(baseEA, eflags, count, size) << ':';
out << std::dec << std::setw(20) << size*count << endl;
}
// Insert instrumentation to log memory addresses accessed.
static VOID insertRepMemoryTraceInstrumentation(INS ins, UINT32 opIdx)
{
const opInfo * op = &opcodes[opIdx];
if (takesConditionalRep(opIdx))
{
if (INS_IsMemoryRead(ins))
{
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)logMemoryAddress,
IARG_UINT32, opIdx,
IARG_UINT32, op->size,
IARG_UINT32, 0, // Fake Eflags, since we're called at each iteration it doesn't matter
IARG_ADDRINT, (ADDRINT)"Read ",
IARG_END);
}
// And similar code for MEMORYREAD2, MEMORYWRITE
}
else
{
if (INS_IsMemoryRead(ins))
{
INS_InsertIfCall(ins, IPOINT_BEFORE, (AFUNPTR)returnArg, IARG_FIRST_REP_ITERATION, IARG_END);
INS_InsertThenCall(ins, IPOINT_BEFORE, (AFUNPTR)logMemoryAddress,
IARG_UINT32, opIdx,
IARG_BOOL, TRUE, // First must be TRUE else we wouldn't be called
IARG_UINT32, op->size,
IARG_REG_VALUE, REG_EFLAGS, // REG_EFLAGS is used on 32-bit systems only. For 64-bit use REG_RFLAGS or REG_GFLAGS
IARG_ADDRINT, (ADDRINT)"Read ",
IARG_END);
}
// And similar code for MEMORYREAD2, MEMORYWRITE
}
}

Since there are real codes where a significant proportion of all instructions are REP prefixed, using IARG_FIRST_REP_ITERATION to collect information at the beginning of the REP "loop" while skipping it for the later iterations can be a significant optimization.

A tool which demonstrates all of these techniques can be found in source/tools/ManualExamples/countreps.cpp, from which these (slightly edited) code snippets were taken.



Memory management


Pin's dynamic memory allocation

Pin allows the Pintool to dynamically allocate memory using standard C and C++ library routines without interfering with the execution of the application that is run under Pin. In order to achieve this, Pin implements its own memory allocator which is separate from the application's memory allocator, and allocates memory in different memory regions.

Restrict Pin's dynamic memory allocation regions

By default, the memory address region used by Pin to dynamically allocate memory for both Pin usage and Pintool usage is unrestricted. However, if Pin memory allocation should be restricted to specific memory regions, the -pin_memory-range knob can be used in Pin's command line to make Pin allocate memory only inside the specified regions. Note that restricting Pin memory allocation to specific regions doesn't mean that it will allocate/reserve the entire memory available those regions!

Note
This knob is currently not supported by Pin 4.x

Limit the maximum memory that Pin can allocate

Pin can be forced to limit the amount of memory it can allocate (in bytes) by using the -pin_memory_size knob in Pin's command line. When a Pintool cannot allocate more memory due to -pin_memory_size limitation, its out of memory callback is called (see PIN_AddOutOfMemoryFunction()). By default, the number of bytes that Pin can allocate is unlimited. We recommend that if a memory limitation is specified, it will be at least 30MB. Regardless of this value Pin will always allocates additional memory required for its internal O/S abstraction layer. This additional amount cannot be changed.

Note
This knob is currently not supported by Pin 4.x

JIT mode

In JIT mode, Pin needs to manage memory for the code cache in addition to the dynamically allocated memory. This means that the memory regions specified by -pin_memory-range restricts both the dynamically allocated memory and the code cache blocks allocated by Pin.

In order to limit the code cache memory allocation, one can specify the -cc_memory_size knob in Pin's command line. Note that the specified limit must be a multiple of the code cache block size (specified with -cache_block_size).

Pin image loader (dynamic linker)

Another component that requires memory while running Pin on an application is the images of Pin, tool, and their shared libraries (aka dynamic link libraries).

In order to restrict the memory that Pin image loader will use when placing the images mentioned above, one can use the -restrict_memory knob in Pin's command line. This will specify memory regions that the Pin loader should not use. Note that the logic of the -restrict_memory knob is reversed from all the other memory range knobs for Pin - as it specifies which memory regions the Pin loader should NOT use.

Note
This knob is currently not supported by Pin 4.x



Executing Remote Procedures


Beginning with version 4.0, Pin introduces a dedicated server process, known as pind, for each process that is being instrumented. The purpose of this server process is to enhance resource isolation and to allow out of process services. On the Windows platform, pind enables Pin to effectively instrument Low Integrity Processes (LIP) which may have various limitations on resource usage and creation.

The pind server supports the integration of plugins that are developed as DLLs. Users of Pin have the flexibility to develop their own plugins, instruct pind to load them, and leverage them for streamlined data processing outside of the Pintool (out of process service). These pind plugin DLLs are not linked with Pin's runtime libraries and don't have the same limitations imposed on Pintools (see Pintool Information and Restrictions), simplifying the incorporation of third-party libraries.

Pin interacts with pind using a Remote Procedure Call (RPC) protocol. Pintool developers can use PIN_DoRPC to execute RPCs by their plugins. We provide remote::do_rpc, a templated wrapper for PIN_DoRPC allowing developers to call RPCs as if they were local functions without having to deal with low-level RPC argument and return value implementation details.

Pin's RPC protocol is synchronous, meaning that the thread calling Pin_DoRPC will block until the plugin executed the request and a response was received by Pin. Developers wishing to implement an asynchronous RPC protocol may do so on-top of the synchronous protocol by using a poll based query solution. Another option is to use E_pin_rpc_flags to modify the behavior of PIN_DoRPC or remote::do_rpc not to wait for a response.

Note
The default behavior PIN_DoRPC is to assert if a response is not received within some internal timeout. This behavior can be changed by using the flags argument of PIN_DoRPC . See E_pin_rpc_flags Flags for remote::do_rpc for details.

This section includes the following subsections:

Security Considerations

PIN_DoRPC allows executing code on behalf of a Pintool in a process other than the one being instrumented. This process (pind) executes under the same user that is executing the instrumented application. However it may end up having other permissions than the instrumented application. In particular, when instrumenting LIP applications on Windows, pind has elevated permissions compared to the instrumented application required to support instrumentation. For this reason, developers of pind RPC plugins should assume that any request they receive is untrusted. We reccomend that pind RPC plugin developers take measures to sanitize input, prevent unauthorized access to system resources and make sure that the plugin cannot be used for arbitrary resource access or code execution. We further recommend that pind RPC plugins undergo fuzz testing of their inputs.

Understanding RPC Argument Encoding & Decoding

For Pin's RPC support we describe a message using a schema defined using t_rpc_message_schema structure. A message schema contains the RPC ID, and an argument schema for each argument of the RPC plus an argument schema for the return value. An argument schema is described by t_rpc_arg_schema values.

An actual argument that is passed to the RPC is represented by a t_rpc_arg structure. This structure contains the actual argument data (or a pointer to it) and the schema for the argument using t_rpc_arg_schema .

PIN_DoRPC accepts three arguments, rpcSchema, rpcArgs, and flags. The rpcSchema is a pointer to a t_rpc_message_schema structure, and rpcArgs is an array of t_rpc_arg structures. The flags argument is of E_pin_rpc_flags type, the default value is PinRpcFlagsNone.

An RPC argument can have one of the types described by E_rpc_arg_type :

RpcBoolean = 0 ,
};
E_rpc_arg_type
The type of RSC RPC argument.
Definition: rscprotomsgtypes.h:46
@ RpcFloat
Floating point argument type (float, double, long double)
Definition: rscprotomsgtypes.h:51
@ RpcBuffer
A memory buffer argument type.
Definition: rscprotomsgtypes.h:52
@ RpcRecord
Definition: rscprotomsgtypes.h:54
@ RpcNil
Indicates no data argument.
Definition: rscprotomsgtypes.h:58
@ RpcVoid
The same as RpcNil.
Definition: rscprotomsgtypes.h:59
@ RpcBoolean
Boolean argument type.
Definition: rscprotomsgtypes.h:47
@ RpcChar
Character argument type (char, wchar_t, char16_t, char32_t)
Definition: rscprotomsgtypes.h:50
@ RpcInt
Integer (possibly negative) argument type.
Definition: rscprotomsgtypes.h:48
@ RpcArray
An array argument which may contain upto 4096 entries of the same type.
Definition: rscprotomsgtypes.h:57
@ RpcPaddingNoEncode
Definition: rscprotomsgtypes.h:60
@ RpcUInt
Unsigned integer argument type.
Definition: rscprotomsgtypes.h:49
@ RpcOOBRef
Currently not supported.
Definition: rscprotomsgtypes.h:53
Note
RpcOOBRef and RpcPaddingNoEncode are not supported and are reserved for future use.
The convenience wrappers provided by remote::do_rpc do not support E_rpc_arg_type::RpcRecord and E_rpc_arg_type::RpcArray.

RPC arguments are written into an RPC request message using an efficient coding that tries to minimize the bandwidth usage. When the arguments are decoded from an RPC request or response, they are decoded according to the schema and copied to the args provided to PIN_DoRPC. If the provided argument does not contain enough space to receive the encoded value, an error will be returned from PIN_DoRPC or remote::do_rpc.

For convenience we provide C++ templated helpers for creating argument and message schemas. For instance, creating an argument schema for an int32_t and a uint32_t looks like:

auto intSchema = pinrt::rscschema::Int_schema_v<int32_t>;
auto uintSchema = pinrt::rscschema::Uint_schema_v<uint32_t>;

Creating a simple schema for an RPC that returns a boolean and receives two integers of different sizes will look:

using namespace pinrt::rscschema;
constexpr auto RPCID = RPCID_MIN + 42;
auto rpcMsgSchema_42 = RPC_message_schema_wrapper<RPCID, // The RPC request ID
Bool_schema_v, // Return value schema
Int_schema_v<int32_t>, // First argument schema
Int_schema_v<int16_t> // Second argument schema
>::schema;
#define RPCID_MIN
The smallest RPC Id available for Pintool developers.
Definition: rscprotomsgtypes.h:205
C++ templated helpers for creating RPC argument and message schemas.
Definition: rscschema.h:37

A schema for an RPC that returns no value and receive no arguments will look:

using namespace pinrt::rscschema;
constexpr auto RPCID = RPCID_MIN + 43;
auto rpcMsgSchema_43 = RPC_message_schema_wrapper<RPCID, // The RPC request ID
Void_schema_v, // Return value schema
>::schema;

Although we mention PIN_DoRPC as it is the underlying API for executing RPC requests, in this manual we will exclusively use remote::do_rpc and accompanying C++ conveniece helpers. We will not show how to create argument and message schemas directly without these helpers, nor how to directly use PIN_DoRPC .

Note
Usage of the C++ wrappers and helper classes require adding -std=c++17 (or -std:c++17 on Windows) to the compiler command.

Dealing with E_rpc_arg_type::RpcBuffer OUT Arguments and Return Values

An RPC messgae schema does not declare whether an argument is an IN, OUT or IN/OUT argument. This is decided by the actual behavior of the Pintool using remote::do_rpc and the pind RPC plugin implementation. Specifying E_rpc_arg_flags::RpcArgFlagsDataEmpty in the flags member of t_rpc_arg, forces the RPC protocol implementation to refrain from sending any data for the argument. Plugin writers should use this flag to prevent replaying the data for IN arguments back to the Pintool. Pintool writers should use it to avoid data transfer for pure OUT E_rpc_arg_type::RpcBuffer arguments by specifying the E_rpc_arg_flags::RpcArgFlagsDataEmpty in the flags argument of remote::rpc_buffer. This flag will direct PIN_DoRPC to avoid copying the data from the buffer to the RPC request and to minimize the space the argument occupies in the outgoing message.

More details on how to minimize data transfer for OUT arguments when calling remote::do_rpc can be found in Interacting With a pind RPC Plugin from a Pintool subsection below and in the documentation for remote::rpc_buffer.

Implementing a pind RPC Plugin

Note
The pind server plugin interface is under active development and subject to change in future releases of Pin. The documentation and samples will be updated accordingly. Although the main functionality is expected to remain stable, backward compatibility is not guaranteed.

Plugins for pind are implemented in dynamically loaded libraries (DLL/SO). A single such dynamic library may contain more than one plugin. A library should publically export just two functions with "C" naming and standard "C" calling conventions:

PLUGIN_EXTERNC PLUGIN__DLLVIS IPindPlugin* PLUGIN_DLLVIS load_plugin(const char* name);
PLUGIN_EXTERNC PLUGIN__DLLVIS void PLUGIN_DLLVIS unload_plugin(IPindPlugin* plugin);
void unload_plugin(IPindPlugin *plugin)
Release resources allocated allocated by load_plugin().
IPindPlugin * load_plugin(const char *name)
Allocate a structure object that can be safely casted to IPindPlugin. Set its pointers to the correct...
Holds function pointers to basic common pind plugin functionality.
Definition: ipind_plugin.h:55

When pind loads a plugin it calls load_plugin with the name of the plugin to load. If the name is supported by the plugin library then load_plugin should return a pointer to an IPindPlugin structure and fill the appropriate function pointers with pointers to the functions that actually implement the plugin functionallity.

For pind RPC plugins the actual structure allocated should be of type IRPCPlugin that has IPindPlugin as its first member allowing direct cast from IRPCPlugin to IPindPlugin.

{
// Internal - Should not be set by user
bool verbose_;
// Internal - Should not be set by user
const char* logName_;
// Internal - Should not be set by user
void (*plugin_log)(IPindPlugin* self, const char* message, size_t length);
bool (*init)(IPindPlugin* self, int argc, const char* const argv[]);
void (*uninit)(IPindPlugin* self);
};
struct IRPCPlugin
{
const char* (*get_injection_data)(IPindPlugin* self);
t_rpc_message_schema const* (*get_rpc_schema)(IPindPlugin* self, t_rpc_id rpcId);
void (*do_rpc)(IPindPlugin* self, t_rpc_id rpcId, t_arg_count argCount, t_rpc_arg* rpcArgs, t_rpc_ret* retRpcArg);
};
E_plugin_type
The plugin type.
Definition: ipind_plugin.h:36
uint8_t t_arg_count
Type of RPC argument count.
Definition: rscprotomsgtypes.h:95
uint32_t t_rpc_id
Type of RPC Ids.
Definition: rscprotomsgtypes.h:88
void(* uninit)(IPindPlugin *self)
Perform any cleanup of resources allocated by init() or during the lifetime of the plugin.
Definition: ipind_plugin.h:124
const char * logName_
Definition: ipind_plugin.h:71
bool verbose_
Definition: ipind_plugin.h:63
E_plugin_type(* get_plugin_type)(IPindPlugin *self)
Return plugin type.
Definition: ipind_plugin.h:101
void(* plugin_log)(IPindPlugin *self, const char *message, size_t length)
Definition: ipind_plugin.h:89
bool(* init)(IPindPlugin *self, int argc, const char *const argv[])
Called to initialize a plugin.
Definition: ipind_plugin.h:115
Holds function pointers to pind RPC plugin functionality.
Definition: ipind_plugin.h:139
IPindPlugin base_
Common pind plugin functionality (base class)
Definition: ipind_plugin.h:144
void(* do_rpc)(IPindPlugin *self, t_rpc_id rpcId, t_arg_count argCount, t_rpc_arg *rpcArgs, t_rpc_ret *retRpcArg)
Execute the RPC request. Store result in the specified retRpcArg.
Definition: ipind_plugin.h:188
TStructure to hold the actual data to encode or the actual data of a decoded argument.
Definition: rscprotomsgtypes.h:141
A collection of RPC_arg_schema structures together describing an RPC message.
Definition: rscprotomsgtypes.h:111

The function pointers declared in IPindPlugin and IRPCPlugin are called by pind at different stages of its operation.

IPindPlugin::get_plugin_type returns the type of the plugin. This function must be implemented and cannot be NULL. User developed plugins must always return E_plugin_type::RPC.

IPindPlugin::init is called after load_plugin is called to initialize the plugin. In OOP terms this function can be considered the constructor of the plugin. If this function returns false then the plugin is immediately unloaded and IPindPlugin::uninit is not called. IPindPlugin::init may be NULL.

IPindPlugin::init() receives plugin specific arguments through the "arguments" member set in the plugin JSON configuration file (i.e: "ExtraRPCPlugins/i/arguments" or "/RSCServer/RPCPlugins/i/arguments").

These arguments are passed to through argc & argv[] (similar to command line arguments being passed to application main function)

Example for the arguments member:

{
"ExtraRPCPlugins": [
{
"lib": "plugin_lib.so",
"name": "plugin 1",
"arguments": [
"-dummy-knob", "1"
]
},
{
"lib": "plugin_lib.so",
"name": "plugin 2",
"arguments": [
"-verbose", "1",
"-o", "/some/where"
]
},
...,
{
"lib": "another_plugin_lib.so",
"name": "plugin n"
}
]
}

IPindPlugin::uninit is called just before the plugin is unloaded. In OOP terms this function can be considered the destructor of the plugin. IPindPlugin::uninit may be NULL.

IRPCPlugin::get_injection_data returns a string of KEY=VALUE pairs separated by semicolns. These values will be injected to Pin as environment variables. IRPCPlugin::get_injection_data may be NULL.

Note
Support for this feature is incomplete, if IRPCPlugin::get_injection_data is not NULL, it will be called but the returned data will not be injected to Pin.

IRPCPlugin::get_rpc_schema is called before an RPC is processed to retrieve the schema for the given rpcId. If the function returns NULL for a given rpcId, then pind will not use this plugin to process this specific RPC request. IRPCPlugin::get_rpc_schema must be specified and cannot be NULL.

IRPCPlugin::do_rpc is called by pind to actually process an RPC request. IRPCPlugin::do_rpc must be specified and cannot be NULL.

The following pseudo code illustrates the way plugins are used to process RPC requests:

for(auto plugin : plugins)
{
auto schema = plugin->get_rpc_schema(plugin, rpcId);
if(nullptr != schema)
{
plugin->do_rpc(plugin, rpcId, argCount, rpcArgs, retRpcArg);
break;
}
}

The server, pind, uses a thread pool to process RPC requests. This means that IRPCPlugin::do_rpc may be called from multiple threads concurrently. Therefore, IRPCPlugin::do_rpc must be thread safe.

It is possible to control the number of threads in the pool by editing the RSCServer/ThreadPoolCount of pind's configuration file. The default is 2 threads. The file is located at <pinkit>/<ia32|intel64>/pinrt/bin/pind_config.json. A value 0 means that the number of threads will be equal to the number of CPU cores on the system. Other values are interpreted as the number of threads in the pool.

RPC Ids are 32 bit unsigned numbers that should be greater or equal to RPCID_MIN and less or equal to RPCID_MAX. Numbers less than RPCID_MIN are reserved for Pin.

Note
Pin plugins will always be considered for RPC request processing before user developed plugins. This means that selecting RPC Ids less than RPCID_MIN for user developed plugins may cause the user plugin not to be considered for the RPC request processing.
This convention is not currently enforced in code. Violating this convention is considered undefined behavior.

We provide an example of writing and using a pind RPC plugin with the buffer_offload example under source/tools/ManualExamples.

Below is the code for the buffer_offload plugin:

/*
* Copyright (C) 2024-2025 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <cassert>
#include <cstring>
#include <fstream>
#include <utility>
#include <set>
#include <map>
#include <iomanip>
#include <mutex>
#include <iostream>
#include <string>
#if __has_include(<filesystem>)
#include <filesystem>
#endif
#if defined(TARGET_WINDOWS)
#include <process.h>
#define getpid() _getpid()
#else
#include <unistd.h>
#endif
#include "ipind_plugin.h"
#include "buffer_offload.h"
struct Buffer_offload_plugin
{
static constexpr int MAX_TOP_RANGES = 3;
IRPCPlugin rpcPlugin;
struct Buffer_offload_plugin_state
{
std::ofstream outfile;
std::set< std::pair< uint8_t*, uint8_t* > > accesses;
void add_access(uint8_t* low, uint8_t* high)
{
static std::mutex mtx;
// We need to use a lock to protect writes to 'accesses' since do_rpc() may be called from
// multiple threads and isn't thread safe.
mtx.lock();
auto range = std::make_pair(low, high);
auto it_ge = accesses.lower_bound(range);
auto it_lt = (accesses.begin() != it_ge) ? std::prev(it_ge) : accesses.end();
if ((accesses.end() != it_ge) && (range.second + 1 >= it_ge->first))
{
range = std::make_pair(range.first, it_ge->second);
accesses.erase(it_ge);
}
if ((accesses.end() != it_lt) && (it_lt->second + 1 >= range.first))
{
range = std::make_pair(it_lt->first, range.second);
accesses.erase(it_lt);
}
accesses.insert(range);
mtx.unlock();
}
};
Buffer_offload_plugin_state* state_;
void do_open_outfile(t_arg_count argCount, t_rpc_arg* rpcArgs, t_rpc_ret* retRpcArg)
{
std::string outfilePath = (char*)rpcArgs[0].argData;
state_->outfile.open(outfilePath);
std::cout << "Log Open Requested for: " << outfilePath << std::endl;
rpcArgs[0].flags = RpcArgFlagsDataEmpty; // This is an IN argument. Mark no data when returning
// Fill in the return value of this operation, in this case a boolean
// that is set to true if the file opened successfully, false otherwise.
retRpcArg->argSchema = OPEN_OUT_FILE_SCHEMA.returnValueSchema;
retRpcArg->argDataSize = ARG_SCHEMA_SIZE(OPEN_OUT_FILE_SCHEMA.returnValueSchema);
retRpcArg->argData = (uint64_t)(state_->outfile.is_open() && state_->outfile.good());
retRpcArg->flags = RpcArgFlagsNone;
}
void do_analyze_mem_access(t_arg_count argCount, t_rpc_arg* rpcArgs, t_rpc_ret* retRpcArg)
{
MEMREF* refs = reinterpret_cast< MEMREF* >(rpcArgs[0].argData);
auto count = rpcArgs[0].argDataSize / sizeof(MEMREF);
for (auto i = 0; i < count; i++)
{
auto low = refs[i].ea;
auto high = refs[i].ea + refs[i].size - 1;
state_->add_access(low, high);
}
rpcArgs[0].flags = RpcArgFlagsDataEmpty; // This is an IN argument. Mark no data when returning
// Fill in the return value of this operation, in this case it is a Nil
// which means there is no return value. We should still fill in this
// struct so that the server will know what kind of return value to encode
// in the response.
retRpcArg->argSchema = MEM_ANALYZE_SCHEMA.returnValueSchema;
retRpcArg->argDataSize = ARG_SCHEMA_SIZE(MEM_ANALYZE_SCHEMA.returnValueSchema);
retRpcArg->argData = 0;
retRpcArg->flags = RpcArgFlagsNone;
}
void do_get_mem_access_info(t_arg_count argCount, t_rpc_arg* rpcArgs, t_rpc_ret* retRpcArg)
{
// Fill in the [out] args
retRpcArg->argData = 0; // Start with setting the ret to false (failure)
if (!state_->accesses.empty())
{
// We get the maximum ranges we can place in the out buffer from the client
auto maxClientTopRangeCount = (int)rpcArgs[2].argData;
// For this RPC we treat arguments as OUT arguments so
// we explicitely make sure all fields schemas and size fields are properly
// initialized.
rpcArgs[0].argData = uintptr_t(state_->accesses.begin()->first);
rpcArgs[0].argSchema = GET_MEM_ACCESS_INFO_SCHEMA.argSchemaArray[0];
rpcArgs[0].argDataSize = ARG_SCHEMA_SIZE(GET_MEM_ACCESS_INFO_SCHEMA.argSchemaArray[0]);
rpcArgs[0].flags = RpcArgFlagsNone;
rpcArgs[1].argData = uintptr_t(state_->accesses.rbegin()->second);
rpcArgs[1].argSchema = GET_MEM_ACCESS_INFO_SCHEMA.argSchemaArray[1];
rpcArgs[1].argDataSize = ARG_SCHEMA_SIZE(GET_MEM_ACCESS_INFO_SCHEMA.argSchemaArray[1]);
rpcArgs[1].flags = RpcArgFlagsNone;
auto maxTopRangesCount = maxClientTopRangeCount < MAX_TOP_RANGES ? maxClientTopRangeCount : MAX_TOP_RANGES;
rpcArgs[2].argData = maxTopRangesCount;
// Allocate memory for the top ranges
size_t bufferSize = maxTopRangesCount * sizeof(MEMREF);
MEMREF* topRanges = (MEMREF*)malloc(bufferSize);
if (nullptr != topRanges)
{
memset(topRanges, 0, bufferSize);
// Find the maxTopRangesCount largest ranges
for (auto& range : state_->accesses)
{
auto rangeSize = range.second - range.first + 1;
for (int i = 0; i < maxTopRangesCount; ++i)
{
if (rangeSize > topRanges[i].size)
{
for (int j = maxTopRangesCount - 1; j > i; --j)
{
topRanges[j] = topRanges[j - 1];
}
topRanges[i].ea = range.first;
topRanges[i].size = rangeSize;
break;
}
}
}
if (rpcArgs[3].deleter)
{
// Free data currently in arg if argData was allocated for us by pind
rpcArgs[3].deleter(reinterpret_cast< void* >(rpcArgs[3].argData));
}
rpcArgs[3].deleter = free; // Our deleter for memory we allocate - pind will call this to free our memory
// after the response to the client is encoded
rpcArgs[3].argData = reinterpret_cast< uint64_t >(topRanges);
rpcArgs[3].argDataSize = bufferSize;
rpcArgs[3].argSchema = GET_MEM_ACCESS_INFO_SCHEMA.argSchemaArray[3];
rpcArgs[3].flags = RpcArgFlagsNone;
retRpcArg->argData = 1; // Mark success
}
} // NO NEED FOR ELSE
// Fill in the return value of this operation, in this case it is a Nil
// which means there is no return value. We should still fill in this
// struct so that the server will know what kind of return value to encode
// in the response.
retRpcArg->argSchema = GET_MEM_ACCESS_INFO_SCHEMA.returnValueSchema;
retRpcArg->argDataSize = ARG_SCHEMA_SIZE(GET_MEM_ACCESS_INFO_SCHEMA.returnValueSchema);
retRpcArg->flags = RpcArgFlagsNone;
}
void write_report()
{
if (state_->accesses.size() == 0) return;
// We want to create a map of (key=block size) and (value=number of blocks of that size).
// Go over the map of accesses. For each access calculate its size.
// Then increment the appropriate entry in the map.
std::map< size_t, uint32_t > blocks;
for (auto const& pair : state_->accesses)
{
auto accessSize = (pair.second - pair.first + 1);
blocks[accessSize]++;
}
state_->outfile << "Overall " << std::dec << state_->accesses.size() << " accesses to contiguous memory ranges."
<< std::endl;
state_->outfile << "Breakdown by contiguous range size:" << std::endl;
state_->outfile << std::left << std::setw(10) << "block size"
<< " | # blocks" << std::endl;
state_->outfile << "---------- | ----------" << std::endl;
for (auto const& block : blocks)
{
state_->outfile << std::left << std::setw(10) << std::dec << block.first << " | " << block.second << std::endl;
}
}
};
/*
* This function is called by the server to query whether this plugin supports the input rpcId.
* If it does then the function should return a pointer to the appropriate message schema.
* If it doesn't then the function should return nullptr;
*/
t_rpc_message_schema const* get_rpc_schema(IPindPlugin* self, t_rpc_id rpcId)
{
if (rpcId == OPEN_OUT_FILE_SCHEMA.rpcId)
{
return &OPEN_OUT_FILE_SCHEMA;
}
else if (rpcId == MEM_ANALYZE_SCHEMA.rpcId)
{
return &MEM_ANALYZE_SCHEMA;
}
else if (rpcId == GET_MEM_ACCESS_INFO_SCHEMA.rpcId)
{
return &GET_MEM_ACCESS_INFO_SCHEMA;
}
return nullptr;
}
/*
* This function is called by the server only if get_rpc_schema(rpcId) returned a non-null schema.
*/
void do_rpc(IPindPlugin* self, t_rpc_id rpcId, t_arg_count argCount, t_rpc_arg* rpcArgs, t_rpc_ret* retRpcArg)
{
auto this_ = reinterpret_cast< Buffer_offload_plugin* >(self);
if (rpcId == OPEN_OUT_FILE_SCHEMA.rpcId)
{
return this_->do_open_outfile(argCount, rpcArgs, retRpcArg);
}
else if (rpcId == MEM_ANALYZE_SCHEMA.rpcId)
{
return this_->do_analyze_mem_access(argCount, rpcArgs, retRpcArg);
}
else if (rpcId == GET_MEM_ACCESS_INFO_SCHEMA.rpcId)
{
return this_->do_get_mem_access_info(argCount, rpcArgs, retRpcArg);
}
}
/*
* This function should always return RPC.
*/
E_plugin_type get_plugin_type(IPindPlugin* self) { return RPC; }
bool init(IPindPlugin* self, int argc, const char* const argv[])
{
assert(2 == argc);
assert(std::string(argv[0]) == "-dummy-knob");
assert(std::string(argv[1]) == "1");
auto this_ = reinterpret_cast< Buffer_offload_plugin* >(self);
this_->state_ = new Buffer_offload_plugin::Buffer_offload_plugin_state;
return true;
}
void uninit(IPindPlugin* self)
{
plugin_log_verbose(self, "buffer offload plugin is being unloaded\n");
#if __has_include(<filesystem>)
// Check that plugin log file was created
std::string filename = "buffer_offload_plugin.log.";
filename += std::to_string(getpid());
if (!std::filesystem::exists(filename))
{
std::cerr << filename << " plugin log filename does not exists" << std::endl;
// Doing exit(-1) from the plugin doesn't make the application crash since it's being done from the pind process.
// Checking it from the makefile.
// exit(-1);
}
#endif
auto this_ = reinterpret_cast< Buffer_offload_plugin* >(self);
this_->write_report();
delete this_->state_;
}
/*
* Load the plugin.
* The plugin must fill in all the entries of the struct with valid function pointers.
* The implementation can be empty but it must be a valid function.
*/
PLUGIN_EXTERNC PLUGIN__DLLVIS struct IPindPlugin* load_plugin(const char* name)
{
static const char* OFFLOAD_PLUGIN_NAME = "buffer offload plugin";
if (0 == strncmp(OFFLOAD_PLUGIN_NAME, name, sizeof(OFFLOAD_PLUGIN_NAME)))
{
Buffer_offload_plugin* plugin = new (std::nothrow) Buffer_offload_plugin;
if (nullptr != plugin)
{
IRPCPlugin* rpcPlugin = &plugin->rpcPlugin;
memset(rpcPlugin, 0, sizeof(IRPCPlugin));
rpcPlugin->base_.init = init;
rpcPlugin->base_.uninit = uninit;
rpcPlugin->get_rpc_schema = get_rpc_schema;
rpcPlugin->do_rpc = do_rpc;
return (IPindPlugin*)rpcPlugin;
}
}
return nullptr; // We don't know the requested plugin or couldn't allocate it
}
/*
* Unload the plugin - create the report and write it to a file.
*/
PLUGIN_EXTERNC PLUGIN__DLLVIS void unload_plugin(struct IPindPlugin* plugin) { delete plugin; }
@ RPC
Indicates a pind RPC plugin.
Definition: ipind_plugin.h:41
pinrt::std::enable_if_t< details::Rpc_arg_type_traits< RetType >::is_void, bool > do_rpc(Args &&... args) noexcept
Execute an RPC (Remote Procedure Call).
Definition: pin_rpc_client.PH:527
#define ARG_SCHEMA_SIZE(argSchema)
Get the size of an RPC argument schema.
Definition: rscprotomsgtypes.h:335
@ RpcArgFlagsDataEmpty
Definition: rscprotomsgtypes.h:75
@ RpcArgFlagsNone
Definition: rscprotomsgtypes.h:73
t_rpc_message_schema const *(* get_rpc_schema)(IPindPlugin *self, t_rpc_id rpcId)
Get the schema for the given id if supported by the plugin.
Definition: ipind_plugin.h:174
E_rpc_arg_flags flags
Encode/Decode flags for the argument.
Definition: rscprotomsgtypes.h:177
size_t argDataSize
The size of data in/pointed to by argData.
Definition: rscprotomsgtypes.h:161
t_rpc_arg_schema argSchema
A schema describing the argument.
Definition: rscprotomsgtypes.h:172
void(* deleter)(void *)
Deleter function pointer to be used to release argument data.
Definition: rscprotomsgtypes.h:167
uint64_t argData
The actual data of the argument.
Definition: rscprotomsgtypes.h:156

Building, Installing and Loading pind RPC Plugins

Building pind RPC Plugins

pind RPC plugin libraries are regular dynamic link libraries. They can be built using any compiler and linker that can produce standard DLLs/SOs. Plugins are built against the native runtime libraries and should not use Pin Runtime libraries.

To use C++ helpers for dealing with message and argument schemas defined in rscschema.h, the plugin source code must be compiled with C++ standard 17 or higher.

All the headers required by the plugin source code are installed in the Pin kit under <ia32|intel64>/pinrt/include/adaptor

The header defining IPindPlugin and IRPCPlugin is ipind_plugin.h and is the only header that must be included directly from the plugin source code. If using the C++ helpers for argument and message schemas then rscschema.h should be included as well.

Here is the makefile rule used to build buffer_offload_plugin on Linux in source/tools/ManualExamples/makefile.rules:

$(OBJDIR)buffer_offload_plugin$(DLL_SUFFIX): buffer_offload_plugin.cpp
    $(APP_CXX) $(RPC_PLUGIN_CXXFLAGS) $(DLL_CXXFLAGS) $(COMP_EXE)$@ $< $(APP_LDFLAGS) $(DLL_LDFLAGS) $(APP_LIBS)
    # Copy test plugin to the pind plugin directory
    $(CP) $(OBJDIR)buffer_offload_plugin$(DLL_SUFFIX) $(PIN_ROOT)/$(TARGET)/pinrt/bin/plugins/

The above rule expands to the following build command when executing make DEBUG=1 buffer_plugin.test:

g++ -std=c++17 -I../../../intel64/pinrt/include/adaptor -O0 -g -fPIC -o obj-intel64/buffer_offload_plugin.so \
    buffer_offload_plugin.cpp -no-pie  -g -shared -Wl,--as-needed -lm -ldl -lpthread
cp obj-intel64/buffer_offload_plugin.so ../../../intel64/pinrt/bin/plugins/

The copy command is required for running Pin with the new plugin. We will discuss the installation requirements in detail in the following section.

Installing and Loading pind RPC Plugins

A pind RPC plugin library must be installed to a plugins directory that should be located at the same directory where pind is installed. In the normally distributed pinkit pind is installed under <pinkit>/<ia32|intel64>/pinrt/bin. The kit already contains the plugins directory at the correct locations: <pinkit>/<ia32>/pinrt/bin/plugins and <pinkit>/<intel64>/pinrt/bin/plugins.

For security reasons plugins may only be loaded from this plugins directory. This means that pind RPC plugin dynamic libraries must be copied to the corresponding plugins directory.

Note
Both 32 bit & 64 bit versions of the plugin should be built and installed if Pin is expected to work with both 32 bit & 64 bit applications. The same goes for the Pintool that uses the plugin to process RPC requests.

Placing the plugin library files in the plugins directory is not enough for pind to load the plugins. For pind to know which plugins to load, Pin should be called with the -rpc_plugin knob specifying a JSON file with the list of plugins to load.

The JSON file takes the following form:

{
"ExtraRPCPlugins": [
{
"lib": "plugin_lib.so",
"name": "plugin 1"
},
{
"lib": "plugin_lib.so",
"name": "plugin 2"
},
...,
{
"lib": "another_plugin_lib.so",
"name": "plugin 1"
}
]
}

For each plugin we wish pind to load we specify an entry in the ExtraRPCPlugins JSON array. Each entry contains two mandatory members. lib is the filename of the plugin library binary. Only simple filenames are allowed. Full or relative paths will be rejected. name is the plugin name passed to load_plugin.

The JSON for the linux version of the buffer_offload example looks:

{
"ExtraRPCPlugins": [
{
"lib": "buffer_offload_plugin.so",
"name": "buffer offload plugin"
}
]
}

In addition, every plugin may use IPindPlugin::plugin_log (function pointer set by pind) to write a log message into a dedicated plugin log file managed by pind.

The path of the the plugin log file can be set through the optional "logfilename" member. It can be relative (to the pin.log location) or absolute.

If the member is not set, a default name will be given from the plugin name. The final log name will be name + .log + .<PID>. Spaces in name will replaced with underscores.

To write log messages, a Plugin can use IPindPlugin::plugin_log() directly or use the following auxiliary functions: plugin_log() / plugin_log_verbose() (recommended).

Additional member name log is used together with plugin_log_verbose() to emit logs conditionally based on this member (as opposed to unconditionally)

Using this mechanism is is not mandatory. Plugins can open files and emits logs into them as they fit.

More information can be found in ipind_plugin.h

Example:

{
"ExtraRPCPlugins": [
{
"lib": "buffer_offload_plugin.so",
"name": "buffer offload plugin"
"log" : true,
"comment" : "logfilename may be relative (compared to pin.log location) or absolute",
"logfilename" : "my_buffer_offload_plugin.log",
"arguments": [
"-dummy-knob", "1"
]
}
]
}

The command for executing the buffer_offload example from source/tools/ManualExamples is as follows:

../../../pin   -rpc_plugin buffer_offload.json -t obj-intel64/buffer_offload.so -o obj-intel64//memory_analysis.log \
        -- ../../../source/tools/Utils/obj-intel64/cp-pin.exe makefile obj-intel64/buffer_offload.makefile.copy > \
        obj-intel64/buffer_offload.out 2>&1

Interacting With a pind RPC Plugin from a Pintool

A Pintool issues an RPC request and the pind RPC plugin serves it. The main Pin API to issue such a request is PIN_DoRPC. However, using PIN_DoRPC requires low level knowledge of how RPC arguments are constructed. This create verbose and hard to read code. For that reason Pin provides C++ templated wrapper, remote::do_rpc, that use compile time type deduction to construct the RPC arguments for the RPC request.

Generally speaking, if the RPC uses only primitive type arguments (either IN or OUT) then calling remote::do_rpc should be as simple as calling a local function.

Note
Passing a constant or a constant reference to an RPC that modifies the argument (OUT or IN/OUT) argument will cause remote::do_rpc to fail at runtime although the RPC itself may actually complete successfully.

For functions that accept E_rpc_arg_type::RpcBuffer arguments there is more to consider. If the argument is an IN argument then remote::do_rpc can automatically deal with passing std::string arguments as well as other containers of standard layout underlying types that provide the data() method (such as std::vector<T> where T is a standard layout type). remote::do_rpc can do the same for IN/OUT arguments if the returned data size from the RPC fits into the underlying container.

For dealing with arbitrary types of buffers Pin provides the remote::rpc_buffer API that allows to easily wrap a data buffer by passing a pointer to the data and the size of the data. remote::rpc_buffer can optionally be passed the flag E_rpc_arg_flags::RpcArgFlagsDataEmpty. If passing this flag, Pin will encode no data for the buffer when sending the request. This can be used to minimize data transfer for OUT arguments. The buffer still has to be big enough to recieve the output data from the RPC response.

The buffer_offload example shows how to use all these variants:

/*
* Copyright (C) 2024-2025 Intel Corporation.
* SPDX-License-Identifier: MIT
*/
#include <iostream>
#include <filesystem>
#include <cstdlib>
#include <cstddef>
#include <unistd.h>
#include "pin.H"
#include "buffer_offload.h"
/*
* This pintool demonstrates Pin 4.x new RPC (Remote Procedure Call) mechanism which allows Pintools to offload
* processing to a remote process.
* This pintool collects information about memory accesses in a trace buffer, and when the buffer
* gets full transmits the buffer to a remote function for further processing.
* The pintool is responsible for the instrumentation and data collection. The remote function implemented as part of a
* Pin server plugin (buffer_offload_plugin.cpp), is responsible for analyzing the collected data. The schema for the
* RPC is shared between the Pintool and the plugin and is located in buffer_offload.h.
*/
// A knob for setting the name of the file into which the remote plugin will write the analysis report
KNOB< std::string > KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "memory_analysis.log", "output file");
// The id of the trace buffer assigned by Pin
BUFFER_ID buffeId = 0;
static bool open_out_file(const std::string& outFileName)
{
bool fileOpened = false;
return remote::do_rpc< OPEN_OUT_FILE_SCHEMA >(fileOpened, outFileName) && fileOpened;
}
static bool mem_analyze(VOID* buffer, uint32_t bufferSize)
{
return remote::do_rpc< MEM_ANALYZE_SCHEMA, void >(remote::rpc_buffer(buffer, bufferSize));
}
static bool get_mem_access_info(ADDRINT& addrMin, ADDRINT& addrMax, unsigned& topRangesCount, MEMREF* topRanges)
{
bool ret = false;
if (remote::do_rpc< GET_MEM_ACCESS_INFO_SCHEMA >(
ret, addrMin, addrMax, topRangesCount,
remote::rpc_buffer(topRanges, topRangesCount * sizeof(MEMREF), RpcArgFlagsDataEmpty)))
{
return ret;
}
return false;
}
static void InitializeRemoteLogger()
{
std::filesystem::path outfile = KnobOutputFile.Value();
if (outfile.is_relative())
{
outfile = std::filesystem::current_path() / outfile;
}
#if (TARGET_WINDOWS)
// The Windows LSC implementation for ::getcwd (Source/pinrt/pinos/lsc/support/windows/syscallimp/getcwd.cpp)
// prefixes the path with '/' - intentionally. However std::ofstream::open is not happy with the '/' so we remove it.
std::string fullpath = outfile.string();
if (0 == fullpath.find("/"))
{
outfile = fullpath.substr(1, fullpath.size() - 1);
}
#endif
std::cout << "Report will be written to " << outfile << std::endl;
ASSERTX(open_out_file(outfile.string()));
}
/*
* This function is called by Pin when the trace buffer gets full.
* In this function we do not process the trace buffer but rather transmit it to the remote
* plugin for further processing.
*/
static VOID* BufferFull(BUFFER_ID id, THREADID tid, const CONTEXT* ctxt, VOID* buffer, UINT64 numElements, VOID* v)
{
ASSERTX(mem_analyze(buffer, numElements * sizeof(MEMREF)));
return buffer;
}
VOID Fini(INT32 code, VOID* v)
{
ADDRINT addrMin = 0, addrMax = 0;
MEMREF topRanges[6] {};
unsigned topRangesCount = sizeof(topRanges) / sizeof(MEMREF);
ASSERTX(get_mem_access_info(addrMin, addrMax, topRangesCount, topRanges));
std::cout << "Lowest address accessed 0x" << std::hex << addrMin << " ; Highest address accessed 0x" << std::hex << addrMax
<< std::endl;
std::cout << "Largest " << topRangesCount << " ranges are:" << std::endl;
for (unsigned i = 0; i < topRangesCount; ++i)
{
std::cout << '\t' << "Base: 0x" << std::hex << uintptr_t(topRanges[i].ea) << std::dec << " Size: " << topRanges[i].size
<< " bytes" << std::endl;
}
}
VOID Trace(TRACE trace, VOID* v)
{
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
{
for (INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins))
{
UINT32 memoryOperands = INS_MemoryOperandCount(ins);
for (UINT32 memOp = 0; memOp < memoryOperands; memOp++)
{
UINT32 numBytesAccessed = INS_MemoryOperandSize(ins, memOp);
INS_InsertFillBuffer(ins, // The application instruction
IPOINT_BEFORE, // before the instruction executes
buffeId, // The id of the buffer whose record is filled
IARG_MEMORYOP_EA, memOp, offsetof(MEMREF, ea), // effective address
IARG_UINT32, numBytesAccessed, offsetof(MEMREF, size), // number of bytes read/written
IARG_END);
}
}
}
}
INT32 Usage()
{
std::cerr
<< "This tool demonstrates offloading analysis work to a remote process. "
<< "Instead of doing the processing in the analysis routine we send the data using an RPC message to the remote process"
<< std::endl;
std::cerr << std::endl << KNOB_BASE::StringKnobSummary() << std::endl;
return -1;
}
int main(int argc, char* argv[])
{
if (PIN_Init(argc, argv))
{
return Usage();
}
// There is a physical limit for transmitting buffers to the remote process - 65535 bytes.
// However we need to leave some room for RSC-RPC Message headers
// We should limit the trace buffer size to not exceed that limit since we want to transmit the full buffer.
const size_t bufferSizeLimit = PIN_CalculateSafeRPCDataSize(&MEM_ANALYZE_SCHEMA);
const size_t pageSize = getpagesize();
ASSERTX(bufferSizeLimit >= pageSize);
auto numPages = bufferSizeLimit / pageSize;
buffeId = PIN_DefineTraceBuffer(sizeof(MEMREF), numPages, BufferFull, 0);
if (buffeId == BUFFER_ID_INVALID)
{
std::cerr << "Error: could not allocate initial buffer" << std::endl;
return 1;
}
InitializeRemoteLogger();
return 0;
}
size_t PIN_CalculateSafeRPCDataSize(t_rpc_message_schema const *rpcSchema) noexcept
Calculate the size available for data given a message schema.
Definition: pin_rpc_client.PH:128
Rpc_buffer_wrapper rpc_buffer(void *mem, size_t size, E_rpc_arg_flags flags=RpcArgFlagsNone) noexcept
Wrap a buffer to be used as an argument to an RPC.
Definition: pin_rpc_client.PH:178

In function open_out_file, we have a call to remote::do_rpc that returns a boolean and passes a std::string as an input argument. Handling of this RPC is done in Buffer_offload_plugin::do_open_outfile of buffer_offload_plugin.cpp.

In function mem_analyze we have a call to remote::do_rpc that returns no value and accept an arbitrary buffer as an input argument. Handling of this RPC is done in Buffer_offload_plugin::do_analyze_mem_access of buffer_offload_plugin.cpp.

Notice how both of the implementations in the plugin code make sure to indicate to pind that no data should be transmitted on these in arguments in the response. This is done by using E_rpc_arg_flags::RpcArgFlagsDataEmpty.

In function get_mem_access_info we have a call remote::do_rpc that returns a boolean, passes two ADDRINT arguments as OUT arguments, one unsigned int as IN/OUT argument and a buffer as an OUT argument. It uses remote::rpc_buffer with the E_rpc_arg_flags::RpcArgFlagsDataEmpty to indicate to Pin that the buffer does not contain any valid data when calling the RPC. The RPC handling in Buffer_offload_plugin::do_get_mem_access_info of buffer_offload_plugin.cpp makes sure to fill the output arguments with the required data.

E_pin_rpc_flags Flags for @ref remote::do_rpc

Pin allows modifying the behavior of remote::do_rpc by passing E_pin_rpc_flags Flags template argument.

{
PinRpcFlagsNone, // Normal operation. Will wait for a response with some "reasonable" internal timeout.
// If a timeout occurs the program will abort
PinRpcFlagsNoResponseRequired, // No response is expected from the server. Can return immediately
PinRpcFlagsCanBlockIndefinitely, // Can block indefinitely waiting for a response
};
E_pin_rpc_flags
RPC flags.
Definition: pin_rpc_client.PH:42
@ PinRpcFlagsCanBlockIndefinitely
Definition: pin_rpc_client.PH:45
@ PinRpcFlagsNoResponseRequired
Definition: pin_rpc_client.PH:44

The default value for the flags argument is PinRpcFlagsNone. This means that remote::do_rpc will block until a response is received or a timeout occurs. The timeout is a "reasonable" timeout for a regular function to complete. If the timeout occurs, the program will abort with an assertion failure.

If the PinRpcFlagsNoResponseRequired flag is set, then remote::do_rpc will not wait for a response and will return after the request is written to the RPC request queue. Caution should be taken when using this flag, as the Pintool will not be able to know if the request was successfully processed by the plugin or not. This flag is useful for fire-and-forget RPC requests that do not require a response. The plugin should be prepared to be called from the same client thread even though the previous request may not have completed.

If the PinRpcFlagsCanBlockIndefinitely flag is set, then remote::do_rpc will block indefinitely waiting for a response. This is useful for RPC requests that are expected to take a long time to complete and the Pintool is willing to wait for the response without any timeout. This flag should be used with caution, as it may cause the Pintool to hang indefinitely if the plugin does not respond or if there is a deadlock in the plugin code.



Pintool Information and Restrictions


Pin Runtime Library (PinRT)

Pin is built on top several components aimed to provide O/S agnostic and compiler agnostic system level APIs. Pin uses an O/S abstraction layer dubbed PINOS which provide O/S independed system-call layer. On top of PINOS, Pin uses and provides a dynamic linker, a C runtime library (CRT) based on Musl, and a C++ runtime library and STL headers based on LLVM. We name these libraries and components together PinRT.

Pin CRT is ISO C and POSIX compliant and provides standard POSIX API on both Linux & Windows. Pin C++ Runtime is C++17 compatible and provides full C++17 support on both Linux & Windows. Pin's dynamic linker allows loading DLLs/SOs built with PinRT libraries.

Note
Pin C++ Runtime does not support RTTI. C++ exceptions are currently supported only for Linux.
For deviation from standard API documentation see Pin C Runtime Deviations From Standard Documentation.
For requirements and limitations please refer to the README.
Pin's dynamic linker (loader) cannot be used to load DLLs/SOs that were not built against PinRT.

General

Pintools and dependencies must be built against (compile and link with) PinRT instead of any system runtime. Pintools and dependency libraries should refrain from using any native system-calls or APIs, and use PinRT APIs for any needed functionality. For more information on how and under which conditions some of these restrictions might be alleviated refer to the O/S specific sections below (Windows OS or Linux OS).

There are several things that a Pintool writer must be aware of.

  • IARG_REG_VALUE cannot be used to pass floating point register values to an analysis routine.
  • Also, see the O/S specific restrictions below (Windows OS or Linux OS).
  • Manually loading a DLL/SO that was not built against Pin RT by explicitely using the application's loader API ( dlopen() / LoadLibrary()) is the same as if the application would have done it. It will cause an Image Load callback (See IMG: Image Object). It will affect the loaded image list of the application, the loading order and on Linux may also directly affect the applications symbol resolution. If this cannot be avoided it is best that these functions be called through PIN_CallApplicationFunction.
  • Instrumentation objects like INS, BBL,TRACE, RTN and IMG are only valid during the lifetime of the corresponding instrumentation function where they were created. Pintool writers must not store them for later access from analysis routines or from other instrumentation functions. If an object is required from an analysis routine or an instrumentation routine where the object was not directly provided, then Pin API should be used to access a required object. See Instrumentation Granularity.

Often, a Pintool writer wants to run the SPEC benchmarks to see the results of their research. There are many ways one can update the scripts to invoke Pin on the SPEC tests; this is one. In your $SPEC/config file, add the following two lines:

submit=$PIN_HOME/pin -t /my/pin/tool -- $command
use_submit_for_speed=yes

Now the SPEC harness will automatically run Pin with whatever benchmarks it runs. Note that you need the full path name for Pin and Pintool binaries.

Best Known Methods for cross platform Pintool development

Pin RT aims to provide the same functionality in the same way on both Linux & Windows. However the cross-platform compatability is not perfect due to inherent differences between facilities provided by the corresponding O/S kernels, and due to differences in code generation conventions. The BKMs in the list below are designed to mitigate some of these limitations:

  • Avoid using O/S specific APIs and system-calls.
  • Prefer C++ STL to CRT.
    C++ uses a subset of the CRT that is well tested cross-platform.
  • Write for Windows as if you were writing for Linux.
    Pin's CRT is based on Musl which is ISO C and POSIX compliant. Musl relies on certain behavior of system-calls. While developing the Windows port we relied mostly on the Linux manual pages and the Linux kernel source code as reference for the system-calls behavior. For more information see Pin C Runtime Deviations From Standard Documentation.
  • Avoid using the long data type.
    long has different sizes on 64-bit application in Linux and Windows. Where long is required, use _arch_long which is provided by Pin RT instead. _arch_long is guranteed to be 32-bit for 32-bit Pintools and 64-bit for 64-bit Pintools. Pin's CRT uses _arch_long for all places where the original Musl CRT uses long. For example the the standard funtion atol which is declared by the standard as long atol(const char*) is declared in Pin CRT as _arch_long atol(const char*). If on Windows long is used for the variable receiving the return value it will be truncated to 32-bits even when building a 64-bit tool.
  • CRT definitions for LONG_MIN, LONG_MAX, ULONG_MAX from limits.h are changed to match _arch_long. This affect primarily Pintools written for 64-bit Windows. The C++ limits header is not affected so std::numeric_limits<long> will have the minimum and maximum values for long and not for _arch_long.
  • Avoid using %l for printf and scanf when writing or reading values of _arch_long As these functions actually expect a vriable of type long. Using %l may result in compilation warnings oe errors.
  • Prefer using explicitly sized types from stdint.h/cdtdint
  • Use printf and scanf sized type macro constants defined in header <inttypes.h> These macros are designed to propely select the format specifier for sized types.
  • Avoid using long double data type.
    long double has different sizes on Linux and Windows. Using long double will result in different results for the same computation on the different platforms.

Linux OS

Pin identifies system-calls at the actual system-call trap instruction, not the libc function call wrapper. Pintools need to be aware of this when interpreting system-call arguments, etc.

Pintools should not try to intercept or unblock signals directly in JIT mode, instead Pintools should use the APIs PIN_InterceptSignal and PIN_UnblockSignal. In probe mode it is possible to use Pin CRT signal API. However, Pintool writers should be aware that by doing so they may break the instrumented application and take measures to make sure that signals intended for the application are properly propagated.

As a rule, Pintools should not use system-calls directly, however it is sometimes required to use a system-call when the functionality is not provided by Pin RT. For more information see Pin Runtime Direct Syscall Interface.

Please report missing functionality on pinheads (see Questions? Bugs?).

Windows OS

Pintools should not call any Win32 APIs. All system interaction should go through Pin Runtime Library (PinRT).

Pin on Windows separates DLLs loaded by the tool from the application DLLs - it makes separate copies of any DLL loaded by Pin and Pintool using the PinRT dynamic linker. Separate copies of system DLLs are not supported by the OS. In order to avoid isolation problems, Pintool should not dynamically load any system DLL. For the same reason, Pintool should avoid linking to any system DLL import library.

In probe mode, the application runs natively, and the probe is placed in the original code. If a tool replaces a function shared by the tool and the application, the behavior is undefined. Using just Pin Runtime Library (PinRT) APIs from analysis routines ensure this cannot happen.

Conflicts between Pin and Windows

Pin uses some base types that conflict with Windows types. If you use "windows.h", you may see compilation errors. To avoid this problem, Pin provides a header "pinrt_windows.h" that should be included wherever "windows.h" is required. Inside "pinrt_windows.h" the header "windows.h" is wrapped inside a namespace WINDOWS to prevent type name clashes with Pin if "pin.H" is also included.

The following BKM should be followed when including "pinrt_windows.h":

  • "pinrt_windows.h" must be the first Windows header included in a translation unit
  • If "pin.H" is also included in a translation unit then:
    • It should be included before "pinrt_windows.h"
    • All other windows headers must be included under namespace WINDOWS.
    • using namespace WINDOWS; should never be used.
  • If "pin.H" is not included then other windows headers should not be included under namespace WINDOWS.

The follwoing example illustrates these guidelines:

// When pin.H is included
#include <pin.H>
#include <pinrt_windows.h>
namespace WINDOWS
{
#include <winsock2.h>
}
// Explicit namespace usage - declaring 'using namespace WINDOWS' will cause errors
WINDOWS::DWORD g_DWORD = 0;
// When pin.H is not included
#include <pinrt_windows.h>
#include <winsock2.h>
DWORD g_DWORD = 0;
Note
It is important to remember that using Windows APIs directly may break isolation and modify the instrumented application behavior in uninteded ways. Never use objects/handles/file-descriptors returnd from Pin Runtime APIs or Pin APIs with native functions, and vice versa.

Constructing PinTools from multiple DLLs on Windows

A Pintool can be composed from multiple DLLs:

  • "main DLL", which is specified in the Pin command line after "-t" switch
  • a number of "secondary DLLs", linked to the "main DLL" statically.

When considering this configuration, take into account that multi-DLL Pin tool may increase memory fragmentation and cause layout conflicts with application images. If there is no compelling reasons for using multiple DLLs, build your tool as a single DLL to reduce the risk of memory conflicts.

Limitations and instructions:

  • Don't use any Pin API in "secondary DLLs". Only the "main DLL" can use Pin API!
  • In order to use a multi DLL Pintool, put "main DLL" and its "secondary DLLs" in the same directory.
  • IMPORTANT: Build each DLL with the recommended Pintool building flags (see Building Your Own Tool).
  • Remove /EXPORT:main link flag and don't reference pin.lib for "secondary DLLs".

Supported executables

Pin can instrument Windows* subsystem executables.
It can't instrument other executables (such as MS-DOS, Win16 or a POSIX subsystem executables).



Building Your Own Tool


Table of Contents

To write your own tool, copy one of the example directories and edit the makefile.rules file to add your tool. The sample tool MyPinTool is recommended. This tool allows you to build either inside or outside the kit directory tree. See Adding Tests, Tools and Applications to the makefile and Defining Build Rules for Tools and Applications for further details on makefile modification.

Note
Supported Compiler Families: On Linux PinTools can be built using GCC C/C++ compiler familty and the Intel Compiler.
On Windows PinTools can be built using LLVM's clang-cl and the Intel Compiler.
For specific versions please refer to the release notes.


Note
Building On Windows: Since the tools are built using make, be sure to first install Cygwin with make or a Mingw based environment (like MSYS2 or GitBash) with make (Pin provides Cygwin & Mingw compatible make files and compiler wrappers). If access to Windows headers or libraries is required, make sure to open your build shell (Cygwin/MSYS/GitBash) form a Visual Studio Developer shell.


Note
Pin uses compiler and linker wrapper scripts. By default these scripts use the default compiler and linker installed on the system. It may be required or desierable to override these defaults. On Linux users may use the PIN_WRAPPER_GCC environment variable to override the default gcc used. If this variable is not set then the default gcc installed on the system is used. On Linux gcc is also used as the linker.
On Windows users may override both the compiler and linker used with the PIN_WRAPPER_COMPILER and PIN_WRAPPER_LINKER environment variables. If these are not set then the default clang-cl, and lld-link are used. On Windows it may be required to specify these variables if the compiler and linker cannot be found on the system PATH.

Building a Tool From Within the Kit Directory Tree

You may either modify MyPinTool or copy it as directed above. If you're using MyPinTool, and the default build rule suffices, you may not have to change makefile.rules. If you are adding a new tool, or you require special build flags for your tool, you will need to modify the makefile.rules file to add your tool and/or specify a customized build rule.

Building YourTool.so (from YourTool.cpp):

make obj-intel64/YourTool.so

For the IA-32 architecture, use "obj-ia32" instead of "obj-intel64". See Useful make Variables and Flags for commonly used make flags to add to your build.

Building a Tool Out of the Kit Directory Tree

Copy the MyPinTool directory to a place of your choosing. This directory will serve as a basis for your tool. Modify the makefile.rules file to add your tool and/or specify a customized build rule.

Building YourTool.so (from YourTool.cpp):

make PIN_ROOT=<path to Pin kit> obj-intel64/YourTool.so

For the IA-32 architecture, use "obj-ia32" instead of "obj-intel64". See Useful make Variables and Flags for commonly used make flags to add to your build.

For changing the directory where the tool will be created, override the OBJDIR variable from the command line:

make PIN_ROOT=<path to Pin kit> OBJDIR=<path to output dir> <path to output dir>/YourTool.so

Building Pintools on Linux Without Pin's Makefile Infrastructure

The easiest way to build a Pintool is using Pin's makefile Infrastructure. However it is possible to build a Pintool directly using gcc/g++.

Pin provides a wrapper for gcc/g++ named pin-gcc and pin-g++ that can be found under <pinkit>/<arch>/pinrt/bin>. Thess wrappers hide away some of the details of building against Pin Runtime Library (PinRT). Using pin-g++ to build a Pintool is quite easy. Here is an example for building a 64-bit Pintool:

 # Replace <pinkit> with the path to the installed Pin kit
 # compiling
 <pinkit>/intel64/pinrt/bin/pin-g++ -std=c++17 -Wall -Werror -Wno-unknown-pragmas \
    -fno-stack-protector \
    -funwind-tables -fasynchronous-unwind-tables -fno-rtti -fPIC -DTARGET_LINUX -faligned-new \
    -O3 -fomit-frame-pointer -fno-strict-aliasing \
    -isystem <pinkit>/intel64/pinrt/include/adaptor \
    -I<pinkit>/source/include/pin \
    -I<pinkit>/source/include/pin/gen \
    -I<pinkit>/extras/components/include \
    -I<pinkit>/extras/xed-intel64/include/xed \
    -I<pinkit>/source/tools/Utils \
    -c -o obj-intel64/MyPinTool.o MyPinTool.cpp
 # linking
 <pinkit>/intel64/pinrt/bin/pin-g++ -shared -Wl,-Bsymbolic \
    -Wl,--version-script=../../../source/include/pin/pintool.ver \
    -L<pinkit>/intel64/lib \
    -L<pinkit>/extras/xed-intel64/lib \
    -lpin -lpinrt-adaptor-static -lxed -lpindwarf -ldwarf -lunwind-dynamic \
    -o obj-intel64/MyPinTool.so obj-intel64/MyPinTool.o 

Some users may wish to use gcc directly without using the supplied wrappers. This option is not recommended and will require extra effort. The main difficulty with this approach is that Pintools are built against Pin Runtime Library (PinRT) and require that -nostdlib be passed to gcc for linking. When passing this flag gcc will fail to automatically find the standard library, the compiler runtime library as well as crtbeginS.o and crtendS.o, requiring the user to specify them explicitely in the correct order.

Note
Although Pin API is C++ and most Pintools are written soley in C++, it is not possible to use g++ to build Pintools directly. Users wishing build Pintools without pin-g++, must use gcc also for building C++ code.

The Makefile code below can be used to build MyPinTool.cpp from <pinkit>/source/tools/MyPinTool without pin-g++ and should be used as reference. To build using this makefile follow these steps:

 cd <pinkit>/source/tools/MyPinTool
 make -f makefile.nopincc TARGET=<arch>
 touch makefile

Replace <arch> with either ia32 for building a 32-bit tool or intel64 for building a 64-bit tool.

  # build 32 bit tool
  make -f makefile.nopincc TARGET=ia32
  # build 64 bit tool
  make -f makefile.nopincc TARGET=intel64

The makefile code is also included here for reference:

#
# Copyright (C) 2024-2024 Intel Corporation.
# SPDX-License-Identifier: MIT
#
# Example makefile for building a tool without pin-gcc/pin-g++
CC ?= gcc
COMPILER ?= $(CC)
PINKIT := ../../../
PINRTDIR := $(PINKIT)/$(TARGET)/pinrt
OBJ_SUFFIX := .o
PINTOOL_SUFFIX := .so
OBJDIR := nopincc-obj-$(TARGET)
TOOL_CXX := $(CC)
TOOL_LINKER := $(CC)
TOOL_CPPFLAGS := -DPIN_CRT=1 -DPIN_RT -DTARGET_LINUX -D_LIBCPP_HAS_MUSL_LIBC -D_GNU_SOURCE
TOOL_CXXFLAGS := -nostdinc -fPIC -fno-stack-protector -fno-rtti
TOOL_USER_CXXFLAGS := -std=c++17 -funwind-tables -fasynchronous-unwind-tables -faligned-new -fno-strict-aliasing \
-O3 -fomit-frame-pointer -Wall -Werror
TOOL_LDFLAGS := -shared -Wl,-Bsymbolic -Wl,--version-script=$(PINKIT)/source/include/pin/pintool.ver \
-Wl,--dynamic-linker,$(PINRTDIR)/bin/pin.ld.so -nostdlib -fPIC
TOOL_LIB_PATH := -L$(PINKIT)/$(TARGET)/lib \
-L$(PINKIT)/extras/xed-$(TARGET)/lib \
-L$(PINRTDIR)/lib
TOOL_LIBS := -lpin \
-lxed \
-lpindwarf \
-ldwarf \
-lunwind-dynamic \
-lpinrt-adaptor-static \
-lc++abi \
-lc++ \
-lpincrt
ifeq ($(TARGET),ia32)
TOOL_CPPFLAGS += -D_arch_long=long -DTARGET_IA32 -DHOST_IA32
TOOL_CXXFLAGS += -m32
TOOL_LDFLAGS += -Wl,-melf_i386
GCC_OBJ_PATH := $(shell dirname $(shell gcc -print-file-name=crtbeginS.o))/32
else
TOOL_CPPFLAGS += -D_arch_long=long -DTARGET_IA32E -DHOST_IA32E
TOOL_CXXFLAGS += -m64
TOOL_LDFLAGS += -Wl,-melf_x86_64
GCC_OBJ_PATH := $(shell dirname $(shell gcc -print-file-name=crtbeginS.o))
endif
ifeq ($(COMPILER), icx)
TOOL_CPPFLAGS += -D_LIBCPP_DISABLE_AVAILABILITY
endif
ifeq ($(COMPILER), gcc)
TOOL_CXXFLAGS += -fabi-version=10
endif
TOOL_LDFLAGS += $(TOOL_LIB_PATH)
TOOL_START_FILES := $(PINRTDIR)/lib/crti.o \
$(GCC_OBJ_PATH)/crtbeginS.o
TOOL_END_FILES := $(GCC_OBJ_PATH)/libgcc.a \
$(GCC_OBJ_PATH)/libgcc_eh.a \
$(GCC_OBJ_PATH)/crtendS.o \
$(PINRTDIR)/lib/crtn.o
TOOL_SYSTEM_INCLUDES := -isystem $(PINRTDIR)/include/c++ \
-isystem $(PINRTDIR)/include/adaptor \
-isystem $(PINRTDIR)/include \
-isystem $(PINRTDIR)/include/pinos \
-isystem $(GCC_OBJ_PATH)/include
TOOL_PIN_INCLUDES := -I $(PINKIT)/source/include/pin \
-I $(PINKIT)/source/include/pin/gen \
-I $(PINKIT)/extras/components/include \
-I $(PINKIT)/extras/xed-$(TARGET)/include/xed \
-I $(PINKIT)/source/tools/Utils
TOOL_OBJECTS = $(OBJDIR)/MyPinTool$(OBJ_SUFFIX)
.PHONY: all ;
all: MyPinTool$(PINTOOL_SUFFIX) ;
clean:
rm -rf $(OBJDIR)
$(OBJDIR):
mkdir -p $(OBJDIR)
MyPinTool$(PINTOOL_SUFFIX) : $(TOOL_OBJECTS) | dir
$(TOOL_LINKER) $(TOOL_LDFLAGS) -o $(OBJDIR)/$@ $(TOOL_START_FILES) $< $(TOOL_LIBS) $(TOOL_END_FILES)
$(OBJDIR)/%$(OBJ_SUFFIX) : %.cpp | dir
$(TOOL_CXX) $(TOOL_CPPFLAGS) $(TOOL_PIN_INCLUDES) $(TOOL_SYSTEM_INCLUDES) $(TOOL_CXXFLAGS) $(TOOL_USER_CXXFLAGS) -c -o $@ $<
dir: $(OBJDIR)

As can be seen, building without pin-g++ requires extra work to find the compiler's runtime library, start files, end files and include path. Care must be taken to properly add all Pin Runtime Library (PinRT) libraries and include paths. It is also important to take care of the order in which object files, libraries, start files and end files are specified in the link command.

Note
All flags specified in TOOL_CPPFLAGS, TOOL_CXXFLAGS and TOOL_LDFLAGS are required. The following flags from TOOL_USER_CXXFLAGS are recommended: -std=c++17 -funwind-tables -fasynchronous-unwind-tables -faligned-new -fno-strict-aliasing

Building Pintools on Windows Without Pin's Makefile Infrastructure

The easiest way to build a Pintool is using Pin's makefile Infrastructure. However it is possible to build a Pintool directly using clang-cl or the Intel compiler, icx.

Pin provides wrappers for clang-cl/icx named pin-clang-cl, pin-clang-cl++, pin-icx and pin-icx++. Additionally Pin provides a linker wrapper for LLVM's lld-link. that can be found under <pinkit>/<arch>/pinrt/bin>. These wrappers hide away some of the details of building against Pin Runtime Library (PinRT). Using pin-clang-cl++ or pin-icx++ to build a Pintool is quite easy. Here is an example for building a 64-bit Pintool:

 # Replace <pinkit> with the path to the installed Pin kit
 # compiling
 <pinkit>/intel64/pinrt/bin/pin-clang-cl++ -Wno-non-c-typedef-for-linkage -Wno-microsoft-include -Wno-unicode \
    -I<pinkit>/source/include/pin \
    -I<pinkit>/source/include/pin/gen \
    -I<pinkit>/intel64/pinrt/include/adaptor \
    -I<pinkit>/extras/components/include \
    -I<pinkit>/extras/xed-intel64/include/xed \
    -I<pinkit>/source/tools/Utils \
    -std:c++17 -MD -O2  -c -Foobj-intel64/MyPinTool.obj MyPinTool.cpp
 # linking
 <pinkit>/intel64/pinrt/bin/pin-ld -dll -c++ -EXPORT:main -INCREMENTAL:NO -IGNORE:4210 \
    -IGNORE:4049 -DYNAMICBASE -NXCOMPAT -OPT:REF \
    -out:obj-intel64/MyPinTool.dll obj-intel64/MyPinTool.obj \
    -LIBPATH:<pinkit>/intel64/lib \
    -LIBPATH:<pinkit>/extras/xed-intel64/lib \
    pin.lib pinrt-adaptor-static.lib xed.lib kernel32.lib 

To use the Intel compiler in the example above, pin-clang-cl++ should be replaced with pin-icx++.

Building Tools in Visual Studio

An example VS project that builds Pintool in the Visual Studio IDE can be found in the \source\tools\MyPinTool directory. Enter this directory and open the project or solution file. To build the tool, select "Build Solution".

To run an application, instrumented by MyPinTool, run Pin from the command line with the created DLL. For example:

 <pinkit>\pin.exe -t  x64\Debug\MyPinTool-clang.dll -count 1 -- "C:\Users\..\my_app.exe"

You can use MyPinTool as a template for your own project. Please, look carefully at the compilation and linking switches in the MyPinTool property pages. For more information about compile and link switches refer to Building Tools From The Command Line Without Wrappers.

Building Tools From The Command Line Without Wrappers

It is possible to build a pintool directly using the compiler and linker without the convenience wrappers supplied by Pin.

Note
The following instructions assume we are building from a Visual Studio Developer Command Prompt and that we have clang-cl installed as part of the Visual Studio installation (It is also possible to work with a stand-alone toolchain).

The following common definitions should be passed:

  # Common Macro Definitions
  -DPIN_CRT=1 -DPIN_RT -DTARGET_WINDOWS
  # For getting full CRT functionallity we add
  -D_GNU_SOURCE -D_XOPEN_SOURCE=700 -D_POSIX_C_SOURCE=200809L
  # For C++ we add
  -D_LIBCPP_HAS_MUSL_LIBC -D_LIBCPP_NO_VCRUNTIME -D_LIBCPP_DISABLE_AVAILABILITY

The following architecture specific macro definitions should be passed:

  # 32 bits
  -DTARGET_IA32 -DHOST_IA32 -D_arch_long=long -D__i386__
  # 64 bits
  -DTARGET_IA32E -DHOST_IA32E "-D_arch_long=long long" -D__x86_64__

The following include file order should be followed:

 # First comes the project specific include files
 # Followed by Pin headers
 # Followed by PinRT headers (using internal-isystem)
 # Followed by Windows headers (using internal-isystem) - Required because of -nostdinc
 # Here is an example for 64 bits (note that the actual Windows paths should be taken from the \c INCLUDE environment variable)
 -I<pinkit>\source\include/pin
 -I<pinkit>\source\include/pin/gen
 -I<pinkit>\source\tools\Utils
 -I<pinkit>\extras\components\include
 -I<pinkit>\extras\xed-intel64\include\xed
 -I<pinkit>\intel64\pinrt\include\adaptor
 -Xclang -internal-isystem -Xclang  <pinkit>\intel64\pinrt\include\c++
 -Xclang -internal-isystem -Xclang <pinkit>\intel64\pinrt\include
 -Xclang -internal-isystem -Xclang <pinkit>\intel64\pinrt\include\pinos
 -Xclang -internal-isystem -Xclang "C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.34.31933\include"
 -Xclang -internal-isystem -Xclang "C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.34.31933\ATLMFC\include"
 -Xclang -internal-isystem -Xclang "C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Auxiliary\VS\include"
 -Xclang -internal-isystem -Xclang "C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt"
 -Xclang -internal-isystem -Xclang "C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\um"
 -Xclang -internal-isystem -Xclang "C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\shared"
 -Xclang -internal-isystem -Xclang "C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\winrt"
 -Xclang -internal-isystem -Xclang "C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\cppwinrt"
 -Xclang -internal-isystem -Xclang "C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um"
Note
Windows include paths must be explicitley specified as -nostdinc must be specified in the compiler flags (see below). It may be possible to avoid specifying -nostdinc and so eliminate the need to explicitley specify the Windows headers - this option, however, was not tested and is not officially supported.

The following compilation flags should be used:

 # Common flags
 -nostdinc -fno-builtin /GS- /EHa- /EHs- /EHc- /Oi- /Gy /wd4530 /GR- /Zc:threadSafeInit- /wd5208
 # Pin recommended flags
 -Wno-non-c-typedef-for-linkage -Wno-microsoft-include -Wno-unicode /fp:strict
 # 32 bit flags
 -m32
 # 64 bit flags
 -m64

Putting it all together a full 64 bit compile command should be similar to:

 clang-cl.exe -D TARGET_IA32E -D HOST_IA32E -D __x86_64__ ^
              -D PIN_CRT=1 -D PIN_RT -D "_arch_long=long long" -D TARGET_WINDOWS ^
              -D _LIBCPP_HAS_MUSL_LIBC -D _LIBCPP_NO_VCRUNTIME -D _LIBCPP_DISABLE_AVAILABILITY -D _GNU_SOURCE ^
              -m64 -fno-builtin -nostdinc /GS- /EHa- /EHs- /EHc- /Oi- /Gy /wd4530 /GR- /Zc:threadSafeInit- /wd5208 ^
              -O2 /std:c++17 -Wno-non-c-typedef-for-linkage -Wno-microsoft-include -Wno-unicode ^
              -I <pinkit>/source/include/pin ^
              -I <pinkit>/source/include/pin/gen ^
              -I <pinkit>/source/tools/Utils ^
              -I <pinkit>/extras/components/include ^
              -I <pinkit>/extras/xed-intel64/include/xed ^
              -I <pinkit>/intel64/pinrt/include/adaptor ^
              -Xclang -internal-isystem -Xclang <pinkit>/intel64/pinrt/include/c++ ^
              -Xclang -internal-isystem -Xclang <pinkit>/intel64/pinrt/include ^
              -Xclang -internal-isystem -Xclang <pinkit>/intel64/pinrt/include/pinos ^
              -Xclang -internal-isystem -Xclang "C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.34.31933/include" ^
              -Xclang -internal-isystem -Xclang "C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.34.31933/ATLMFC/include" ^
              -Xclang -internal-isystem -Xclang "C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Auxiliary/VS/include" ^
              -Xclang -internal-isystem -Xclang "C:/Program Files (x86)/Windows Kits/10/include/10.0.22000.0/ucrt" ^
              -Xclang -internal-isystem -Xclang "C:/Program Files (x86)/Windows Kits/10//include/10.0.22000.0//um" ^
              -Xclang -internal-isystem -Xclang "C:/Program Files (x86)/Windows Kits/10//include/10.0.22000.0//shared" ^
              -Xclang -internal-isystem -Xclang "C:/Program Files (x86)/Windows Kits/10//include/10.0.22000.0//winrt" ^
              -Xclang -internal-isystem -Xclang "C:/Program Files (x86)/Windows Kits/10//include/10.0.22000.0//cppwinrt" ^
              -Xclang -internal-isystem -Xclang "C:/Program Files (x86)/Windows Kits/NETFXSDK/4.8/include/um" ^
              -o obj-intel64/MyPinTool.obj -c MyPinTool.cpp

For linking the following library paths should be specified:

  # 32 bits 
 -LIBPATH:<pinkit>/ia32/pinrt/lib
 -LIBPATH:<pinkit>/ia32/lib
 -LIBPATH:<pinkit>/extras/xed-ia32/lib
 # 64 bits
 -LIBPATH:<pinkit>/intel64/pinrt/lib
 -LIBPATH:<pinkit>/intel64/lib
 -LIBPATH:<pinkit>/extras/xed-intel64/lib

The following start files should be added:

  crtbeginS.obj stdlib_new_delete.obj

The following libraries should be added (note that we always use /NODEFAULTLIB):

  pin.lib pinrt-adaptor-static.lib xed.lib c++.lib pincrt4.lib kernel32.lib

The following link flags should be used:

  -dll /EXPORT:main /NODEFAULTLIB /SAFESEH:NO /SUBSYSTEM:CONSOLE /INCREMENTAL:NO /IGNORE:4210 /IGNORE:4049 /DYNAMICBASE /NXCOMPAT
  # For optimization (For debug use /OPT:NOREF)
  /OPT:REF
  # For 32 bits we add
  /MACHINE:X86
  # For 64 bits we add
  /MACHINE:X64

Putting it all together a full 64 bit link command should be similar to:

 lld-link.exe -dll -LIBPATH:<pinkit>/intel64/pinrt/lib ^
                  -LIBPATH:<pinkit>/intel64/lib ^
                  -LIBPATH:<pinkit>/extras/xed-intel64/lib ^
                  /EXPORT:main /NODEFAULTLIB /SAFESEH:NO /SUBSYSTEM:CONSOLE /INCREMENTAL:NO ^
                  /IGNORE:4210 /IGNORE:4049 /DYNAMICBASE /NXCOMPAT /OPT:REF /MACHINE:X64 ^
                  crtbeginS.obj stdlib_new_delete.obj c++.lib pincrt4.lib ^
                  pin.lib pinrt-adaptor-static.lib xed.lib kernel32.lib ^
                  -out:obj-intel64/MyPinTool.dll obj-intel64/MyPinTool.obj

In the above commands replace <pinkit> with the path to the Pin installation directory.

As can be seen, building without Pin's compiler wrappers is more complex and requires manually specifying Windows SDK and compiler paths. Care should be taken to pass the correct flags and to specify all required libraries.



Pin's makefile Infrastructure


Table of Contents

Using Pin's makefile Infrastructure

Pintools are built using make on all target platforms. This section describes the basic flags available in Pin's makefile infrastructure. This is not a makefile tutorial. For general information about makefiles, refer to the makefile manual available at http://www.gnu.org/software/make/manual/make.html.

The Config Directory

The source/tools/Config directory holds the common make configuration files which should not be changed and template files which may serve as a basis for your own makefiles. This sections gives a short overview of the most notable files in the directory. The experienced user is welcome to read through the complete set of configuration files for better understanding the tools' build process.

makefile.config: This is the first file to be included in the make include chain. It holds documentation of all the relevant flags and variables available to users, both within the makefile and from the command shell. Also, this file includes the OS-specific configuration files.

makefile.unix.config: This file holds the Unix definitions of the makefile variables. See makefile.win.config for the Windows definitions.

unix.vars: This file holds the Unix definitions of some architectural variables and utilities used by the makefiles. See win.vars for the Windows definitions.

makefile.default.rules: This file holds the default make targets, test recipes and build rules.

The Test Directories

Each test directory in source/tools/ contains two files in the makefile chain.

makefile: This is the makefile which will be invoked when running make. This file should not be changed. It holds the include directives for all the relevant configuration files of the makefile chain in the correct order. Changing this order may result in unexpected behavior. This is a generic file, it is identical in all test directories.

makefile.rules: This is the directory-specific makefile. It holds the logic of the current directory. All tools, applications and tests that should be built and run in a directory are defined in this file. See Adding Tests, Tools and Applications to the makefile for adding tests, tools and applications to makefile.rules.

Adding Tests, Tools and Applications to the makefile

This section describes how to define your applications, tools and tests in the makefile. The sections below describe how to build the binaries and how to run the tests.

The variables detailed below, hold the tests, applications and tools definitions. They are defined in the "Test targets" section of makefile.rules. See this section for additional variables and more detailed documentation for each variable.

TOOL_ROOTS: Define the name of your tool here, without the file extension. The correct extension, according to the OS, will be added automatically by make. For example, for adding YourTool.so:

TOOL_ROOTS := YourTool

APP_ROOTS: Define your application here, without the file extension. The correct extension according to the OS, will be added automatically by make. For example, for adding YourApp.exe:

APP_ROOTS := YourApp

TEST_ROOTS: Define your tests here without the .test suffix. This suffix will be added automatically by make. For example, for adding YourTest.test:

TEST_ROOTS := YourTest

Defining Build Rules for Tools and Applications

Default build rules for tools and applications are defined in source/tools/Config/makefile.default.rules. The default tool requires a single c/cpp source file and will generate a tool of the same name. For example, for YourTool.cpp make will generate YourTool.so with the default build rule. However, if your tool requires more than one source file, or you need a customized build rule, add your rule at the bottom of makefile.rules in the "Build rules" section. There is no need to add the dependency to the build rule, it will be added automatically. This dependency creates the build output directory obj-intel64 (or obj-ia32 for the IA-32 architecture). See source/tools/Config/makefile.config for all available compilation and link flags.

Here are a few useful examples:

Building an unoptimized tool from a single source:

# Build the intermediate object file.
$(OBJDIR)YourTool$(OBJ_SUFFIX): YourTool.cpp
    $(CXX) $(TOOL_CXXFLAGS_NOOPT) $(COMP_OBJ)$@ $<

# Build the tool as a dll (shared object).
$(OBJDIR)YourTool$(PINTOOL_SUFFIX): $(OBJDIR)YourTool$(OBJ_SUFFIX)
    $(LINKER) $(TOOL_LDFLAGS_NOOPT) $(LINK_EXE)$@ $< $(TOOL_LPATHS) $(TOOL_LIBS)

Building an optimized tool from several source files:

# Build the intermediate object file.
$(OBJDIR)Source1$(OBJ_SUFFIX): Source1.cpp
    $(CXX) $(TOOL_CXXFLAGS) $(COMP_OBJ)$@ $<

# Build the intermediate object file.
$(OBJDIR)Source2$(OBJ_SUFFIX): Source2.c Source2.h
    $(CC) $(TOOL_CXXFLAGS) $(COMP_OBJ)$@ $<

# Build the tool as a dll (shared object).
$(OBJDIR)YourTool$(PINTOOL_SUFFIX): $(OBJDIR)Source1$(OBJ_SUFFIX) $(OBJDIR)Source2$(OBJ_SUFFIX) Source2.h
    $(LINKER) $(TOOL_LDFLAGS_NOOPT) $(LINK_EXE)$@ $(^:%.h=) $(TOOL_LPATHS) $(TOOL_LIBS)

Defining Test Recipes in makefile.rules

A default test recipe is defined in source/tools/Config/makefile.default.rules. For most users, this recipe is insufficient. You may specify your own test recipes in makefile.rules in the "Test recipes" section. There is no need to add the $(OBJDIR) dependency to the build rule, it will be added automatically. This dependency creates the build output directory obj-intel64 (or obj-ia32 for the IA-32 architecture).

Example:

YourTest.test: $(OBJDIR)YourTool$(PINTOOL_SUFFIX) $(OBJDIR)YourApp$(EXE_SUFFIX)
    $(PIN) -t $< -- $(OBJDIR)YourApp$(EXE_SUFFIX)

Useful make Variables and Flags

For a complete list of all the available variables and flags, see source/tools/Config/makefile.config . Here is a short list of the most useful flags:
PIN_ROOT: Specify the location for the Pin kit when building a tool outside of the kit.
CC: Override the default c compiler for tools.
CXX: Override the default c++ compiler for tools
APP_CC: Override the default c compiler for applications. If not defined, APP_CC will be the same as CC.
APP_CXX: Override the default c++ compiler for applications. If not defined, APP_CXX will be the same as CXX.
TARGET: Override the default target architecture e.g. for cross-compilation.
ICC: Specify ICC=1 when building tools with the Intel Compiler.
DEBUG: When DEBUG=1 is specified, debug information will be generated when building tools and applications. Also, no compilation and/or link optimizations will be performed.



Libraries for Linux




Installing Pin


To install a kit, unpack a downloaded kit and change to the directory.

For Linux kits, use tar xzf <kit name> to unpack the kit.

For Windows kits, use the zip folders feature of Windows or any unzip tool to unpack the kit.

Kit names are of the form:

  pin-4.<minor>-<build>-g<commit>-<compiler>-<platform>.tar.gz for Linux kits
  pin-4.<minor>-<build>-g<commit>-<compiler>-<platform>.zip for Windows kits.

For example:

  pin-4.0-99625-gc5b279576-gcc-linux.tar.gz
  pin-4.0-99625-gc5b279576-clang-windows.zip

For better security, be advised to install on secure location.



Questions? Bugs?


Send bugs and questions and feature requests at https://groups.io/g/pinheads. Complete bug reports that are easy to reproduce are fixed faster, so try to provide as much information as possible. Include: kit number, your OS version, compiler version. Try to reproduce the problem in a simple example that you can send us.



Disclaimer and Legal Information


The information in this manual is subject to change without notice and Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. This document and the software described in it are furnished under license and may only be used or copied in accordance with the terms of the license. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. The information in this document is provided in connection with Intel products and should not be construed as a commitment by Intel Corporation.

EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.

Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

The software described in this document may contain software defects which may cause the product to deviate from published specifications. Current characterized software defects are available on request.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Java is a registered trademark of Oracle and/or its affiliates.

Other names and brands may be claimed as the property of others.

Copyright 2004-2026 Intel Corporation.

Intel Corporation, 2200 Mission College Blvd., Santa Clara, CA 95052-8119, USA.