• 2019 Update 3
  • 03/07/2019
  • Public Content
  • Download as PDF

Using VTserver

Processes in non-MPI applications or systems are created and communicate using non-standard and varying methods. The communication may be slow or unsuitable for Intel® Trace Collector communication patterns. Therefore a special version of the Intel® Trace Collector library
was developed that neither relies on MPI nor on the application's communication, but rather implements its own communication layer using TCP/IP. This is why it is called client-server.
library allows the generation of executables that work without MPI. Linking is accomplished by adding
on Microsoft* Windows* OS) and the libraries it needs to the link line:
. The application has to call
to generate a tracefile. Function tracing can be used with and without further Intel® Trace Collector API calls to actually generate trace events.
This section describes the design, implementation and usage of Intel® Trace Collector for distributed applications.
The application has to meet the following requirements:
  • The application handles startup and termination of all processes itself. Both startup with a fixed number of processes and dynamic spawning of processes is supported, but spawning processes is an expensive operation and should not be done too frequently.
  • For a reliable startup, the application has to gather a short string from every process in one place to bootstrap the TCP/IP communication in Intel® Trace Collector. Alternatively, one process is started first and its string is passed to the others. In this case you can assume that the string is always the same for each program run, but this is less reliable because the string encodes a dynamically chosen port which may change.
  • Map the hostname to an IP address that all processes can connect to.
This is not the case if
lists the hostname as alias for and processes are started on different hosts. As a workaround for that case the hostname is sent to other processes, which then requires a working name lookup on their host systems.
Intel® Trace Collector for distributed applications consists of a special library (
) that is linked into the application's processes and the
executable, which connects to all processes and coordinates the trace file writing. Linking with
is required to keep the overhead of logging events as small as possible, while
can be run easily in a different process.
Alternatively, the functionality of the
can be accomplished with another API call by one of the processes.
This is how the application starts, collects trace data and terminates:
  1. The application initializes itself and its communication.
  2. The application initializes communication between VTserver and processes.
  3. Trace data is collected locally by each process.
  4. VT data collection is finalized, which moves the data from the processes to the VTserver, where it is written into a file.
  5. The application terminates.
The application may iterate several times over points 2 till 4. Looping over 3 and the trace data collection part of 4 are not supported at the moment, because:
  • it requires a more complex communication between the application and VTserver
  • the startup time for 2 is expected to be sufficiently small
  • reusing the existing communication would only work well if the selection of active processes does not change
If the startup time turns out to be unacceptably high, then the protocol between application and Intel® Trace Collector could be revised to support reusing the established communication channels.

Initialize and Finalize

The application has to bootstrap the communication between the VTserver and its clients. This is done as follows:
  1. The application server initiates its processes.
  2. Each process calls
  3. VT_clientinit()
    allocates a port for TCP/IP communication with the VTserver or other clients and generates a string which identifies the machine and this port.
  4. Each process gets its own string as result of
  5. The application collects these strings in one place and calls VTserver with all strings as soon as all clients are ready. VT configuration is given to the VTserver as file or through command line options.
  6. Each process calls
    to actually establish communication.
  7. The VTserver establishes communication with the processes, then waits for them to finalize the trace data collection.
  8. Trace data collection is finalized when all processes have called
  9. Once the VTserver has written the trace file, it quits with a return code indicating success or failure.
Some of the VT API calls may block, especially
. Execute them in a separate thread if the process wants to continue. These pending calls can be aborted with
, for example if another process failed to initialize trace data collection. This failure has to be communicated by the application itself and it also has to terminate the VTserver by sending it a kill signal, because it cannot be guaranteed that all processes and the VTserver will detect all failures that might prevent establishing the communication.

Running without VTserver

Instead of starting VTserver as rank 0 with the contact strings of all application processes, one application process can take over that role. It becomes rank 0 and calls
with the information normally given to VTserver. This changes the application startup only slightly.
A more fundamental change is supported by first starting one process with rank 0 as server, then taking its contact string and passing it to the other processes. These processes then give this string as the initial value of the contact parameter in
. To distinguish this kind of startup from the dynamic spawning of process described in the next section, the prefix
needs to be added by the application before calling
. An example where this kind of startup is useful is a process which preforks several child processes to do some work.
In both cases it may be useful to note that the command line arguments previously passed to VTserver can be given in the
array as described in the documentation of

Spawning Processes

Spawning new processes is expensive, because it involves setting up TCP communication, clock synchronization, configuration broadcasting, amongst others. Its flexibility is also restricted because it needs to map the new processes into the model of communicators that provide the context for all communication events. This model follows the one used in MPI and implies that only processes inside the same communicator can communicate at all.
For spawned processes, the following model is currently supported: one of the existing processes starts one or more new processes. These processes need to know the contact string of the spawning process and call
with that information; in contrast to the startup model from the previous section, no prefix is used. Then while all spawned processes are inside
, the spawning process calls
which does all the work required to connect with the new processes.
The results of this operation are:
  • a new
    which contains all of the spawned processes, but not the spawning process
  • a communicator which contains the spawning process and the spawned ones; the spawning process gets it as result from
    and the spawned processes by calling
The first of these communicators can be used to log communication among the spawned processes, the second for communication with their parent. There's currently no way to log communication with other processes, even if the parent has a communicator that includes them.

Tracing Events

Once a process' call to
has completed successfully it can start calling VT API functions that log events. These events will be associated with a time stamp generated by Intel® Trace Collector and with the thread that calls the function.
Should the need arise, then VT API functions could be provided that allow one thread to log events from several different sources instead of just itself.
Event types supported at the moment are those also provided in the normal Intel® Trace Collector, like state changes (
) and sending and receiving of data (
). The resulting trace file is in a format that can be loaded and analyzed with Intel® Trace Analyzer.


Executables in the application are linked with
. It is possible to have processes implemented in different languages, as long as they use the same version of the
The VTserver has the following synopsis:
<contact infos>
[config options]
Each contact info is guaranteed to be one word and their order on the command line is irrelevant. The configuration options can be specified on the command line by adding the prefix
and listing its arguments after the keyword. This is an example for contacting two processes and writing into the file
in STF format:
<contact1> <contact2>
--logfile-name example.stf
All options can be given as environment variables. The format of the configuration file and the environment variables are described in more detail in the chapter about


uses the same techniques as fail-safe MPI tracing to handle failures inside the application, therefore it will generate a trace even if the application segfaults or is aborted with Ctrl + C.
When only one process runs into a problem, then
tries to notify the other processes, which then should stop their normal work and enter trace file writing mode. If this fails and the application hangs, then it might still be possible to generate a trace by sending a
to all processes manually.


There are two examples using MPI as means of communication and process handling. But as they are not linked against the normal Intel® Trace Collector library, tracing of MPI has to be done with Intel Trace Collector API calls.
is a full-blown example that simulates and handles various error conditions. It uses threads and fork/exec to run API functions and VTserver concurrently.
is a stripped down version that is easier to read, but does not check for errors.
The dynamic spawning of processes is demonstrated by
. It first initializes one process as server with no clients, then forks to create new processes and connects to them with
. This is repeated recursively. Communication is done through pipes and logged in the new communicators.
is a variation of the previous example which also uses fork and pipes, but creates the additional processes at the beginning without relying on dynamic spawning.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804