Intel® Trace Analyzer and Collector 8.1

Understand MPI application behavior, quickly finding bottlenecks, and achieving high performance for parallel cluster applications

  • Powerful MPI Communications Profiling and Analysis
  • Scalable - Low Overhead & Effective Visualization
  • Flexible to Fit Workflow – Compile, Link or Run

Buy Now

Intel® Trace Analyzer and Collector is only available as part of Intel® Cluster Studio 2013 or Intel® Cluster Studio XE 2013

Address MPI application behavior, bottlenecks, and performance with this powerful MPI software tool for Windows* and Linux*. Intel® Trace Analyzer and Collector is only available as part of Intel® Cluster Studio 2013 or Intel® Cluster Studio XE 2013.

Intel® Trace Analyzer and Collector is a graphical tool for understanding MPI application behavior, quickly finding bottlenecks, improving correctness, and achieving high performance for parallel cluster applications based on Intel architecture. New features include trace file comparison, counter data displays, extensively detailed and aligned timelines, and an MPI correctness checking library. Improve weak and strong scaling for small and large applications all with Intel® Trace Analyzer and Collector MPI software.

Benefits:

  • Visualize and understand parallel application behavior
  • Evaluate profiling statistics and load balancing
  • Analyze performance of subroutines or code blocks
  • Learn about communication patterns, parameters, and performance data
  • Identify communication hotspots
  • Decrease time to solution and increase application efficiency

Features

MPI Checking

  • A unique MPI Correctness Checker detects deadlocks, data corruption, and errors with MPI parameters, data types, buffers, communicators, point-to-point messages and collective operations.
  • The Correctness Checker allows the user to scale to extremely large systems and detect errors even among a large number of processes.

Interface and Displays

  • Intel® Trace Analyzer and Collector includes full-color customizable GUI with many drill-down view options.
  • The analyzer is able to extremely rapidly unwind the call stack and use debug information to map instruction addresses to source code.
  • With both command-line and GUI interfaces, the user can additionally set up batch runs or do interactive debugging.

Scalability

  • Low overhead allows random access to portions of a trace, making it suitable for analyzing large amounts of performance data.
  • Thread safety allows you to trace multithreaded MPI applications for event-based tracing as well as non-MPI threaded applications.

Instrumentation and Tracing

  • Low-intrusion instrumentation supports MPI applications with C, C++, or Fortran.
  • Intel® Trace Analyzer and Collector automatically records performance data from parallel threads in C, C++, or Fortran.

Product in-depth

Videos to help you get started.

Register for future Webinars


Previously recorded Webinars:

  • Increase Cluster MPI Application Performance with a "MPI Tune" Up
  • MPI on Intel® Xeon Phi™ coprocessor
  • Quickly discover performance issues with the Intel® Trace Analyzer and Collector 9.0 Beta

More Tech Articles

Intel® Cluster Tools Open Source Downloads
By Gergana Slavova (Intel)Posted 03/06/20140
This article makes available third-party libraries and sources that were used in the creation of Intel® Software Development Products. Intel provides this software pursuant to their applicable licenses. Products and Versions: Intel® Trace Analyzer and Collector for Linux* gcc-3.2.3-42.zip (whi...
Using the Intel® MPI Library on Intel® Xeon Phi™ Coprocessor Systems
By loc-nguyen (Intel)Posted 03/19/201315
Download Article Download Using the Intel® MPI Library on Intel® Xeon Phi™ Coprocessor Systems [PDF 499KB] Table of Contents Chapter 1 – Introduction 1.1 – Overview 1.2 – Compatibility Chapter 2 – Installing the Intel® MPI Library 2.1 – Installing the Intel MPI Library 2.2 – Preparation Chapter 3...
Intel® Trace Collector Filtering
By James Tullos (Intel)Posted 03/14/20130
Filtering in the Intel® Trace Collector will apply specified filters to the trace collection process.  This directly reduces the amount of data collected.  The filter rules can be applied either via command line arguments or in a configuration file (specified by the environment variable VT_CONFIG...
Iscriversi a

Supplemental Documentation

Intel® Trace Analyzer and Collector 8.1 Update 3 Readme
By Gergana Slavova (Intel)Posted 08/13/20130
The Intel® Trace Analyzer and Collector 8.1 Update 3 for Linux* and Windows* is a low-overhead scalable event-tracing library with graphical analysis that reduces the time it takes an application developer to enable maximum performance of cluster applications. This package is for users who dev...
Intel® Trace Analyzer and Collector 8.1 Update 2 Readme
By Gergana Slavova (Intel)Posted 06/07/20130
The Intel® Trace Analyzer and Collector 8.1 Update 2 for Linux* and Windows* is a low-overhead scalable event-tracing library with graphical analysis that reduces the time it takes an application developer to enable maximum performance of cluster applications. This package is for users who develo...
Intel® Trace Analyzer and Collector 8.1 Update 1 Readme
By Gergana Slavova (Intel)Posted 04/05/20130
The Intel® Trace Analyzer and Collector 8.1 Update 1 for Linux* and Windows* is a low-overhead scalable event-tracing library with graphical analysis that reduces the time it takes an application developer to enable maximum performance of cluster applications. This package is for users ...
Intel® Trace Analyzer and Collector Guides
By James Tullos (Intel)Posted 03/15/20130
This is currently a placeholder for Intel® Trace Analyzer and Collector usage guides.  Until articles are added, please visit the Intel® Trace Analyzer and Collector product page.  You can also view the documentation.

Pagine

Iscriversi a

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


Run Intel MPI without mpirun/mpiexec
By Jackey Y.1
Hi, I am wondering does Intel MPI support a MPI run without mpirun/mpiexec in the command line? I know that in MPI-2 standard, it supports the “dynamic process” feature, i.e., dynamically generate/spawn processes from existing MPI process. What I am trying to do here is 1) Firstly, launch a singleton MPI process without mpirun/mpiexec in the command line; 2) Secondly, use MPI_Comm_spawn to spawn a set of process on the different host machines. I tried to do that, but it seems that the Intel MPI cannot find the host file. Because I did not use mpirun in the command line, I used environment variable I_MPI_HYDRA_HOST_FILE to set the host file. But, still it seems it cannot find the host file. Any idea? Here is my package info: Package ID: l_mpi_p_4.1.3.049 Package Contents: Intel(R) MPI Library for Linux* OS   Thanks,   Jackey
Difference between mpicc and mpiicc
By Fuli F.7
I write a simple mpi program as follow: #include "mpi.h" #include <stdio.h> #include <math.h> void main(argc,argv) int argc; char *argv[]; {     int myid,numprocs;     int namelen;     char pro_name[MPI_MAX_PROCESSOR_NAME];     MPI_Init(&argc,&argv);     MPI_Comm_rank(MPI_COMM_WORLD,&myid);     MPI_Comm_size(MPI_COMM_WORLD,&numprocs);     MPI_Get_processor_name(pro_name,&namelen);     printf("Process %d of %d on %s\n",myid,numprocs,pro_name);     MPI_Finalize(); } When I compile it with "mpicc -o xxx xxx.c" and run it with "mpirun -np 8 ./xxx", it rightly creates 8 processes. But when I compile it with "mpiicc -o xxx xxx.c" and run with the same order as above, it only creates 1 process. I want to know what's the difference between the mpicc and mpiicc. Is it caused by some faults made during my installment? And how can I fix it? By the way, I install the impi and compiler of intel by installing the intel cluster studio (l_ics_2013.1.0...
What/where is DAPL provider libdaplomcm.so.2 ?
By Beaver66752
DAPL providers ucm, scm are frequently mentioned, but what is libdaplomcm.so.2? Could someone point me to a description of the use case for the DAPL provider libdaplomcm.so.2? I am currently using the Intel MPI Library 4.1 for Linux with Mellanox OFED 2.1; shm:dapl and shm:ofa both seem to work, but with shm:dapl I get warning messages about not being able to find libdaplomcm.so.2. Mellanox DAPL does have this file. This file does not appear in DAPL 2.0.41 either: http://www.openfabrics.org/downloads/dapl/ I found the file in MPSS 3.2; can I just drop this file into a Mellanox 2.1 /usr/lib64 installation?
IMPI dapl fabric error
By san5
Hi, I'm trying to run HPL benchmark on an Ivybridge Xeon processor with 2 Xeon Phi 7120P MIC cards. I'm using offload xhpl binary from Intel Linpack. It throws following error $ bash runme_offload_intel64 This is a SAMPLE run script.  Change it to reflect the correct number of CPUs/threads, number of nodes, MPI processes per node, etc.. MPI_RANK_FOR_NODE=1 NODE=1, CORE=, MIC=1, SHARE= MPI_RANK_FOR_NODE=0 NODE=0, CORE=, MIC=0, SHARE= [1] MPI startup(): dapl fabric is not available and fallback fabric is not enabled [0] MPI startup(): dapl fabric is not available and fallback fabric is not enabled I checked the same errors on this forum and got to know that to unset I_MPI_DEVICES variable. This made the HPL to run. But performance is very low, just 50%. On the other node, with same hardware, HPL efficiency is 84%. Following is the short output of openibd status from both systems, which shows the difference. ON NODE with HPL 84%                                                 ON ...
Memory Leak detected by Inspector XE in MPI internal buffer
By burnesrr2
I am interested in finding out if there is a way to configure Intel's MPI libraries to alter what the threshold is for the creation of internal buffers so I can verify the source of a memory leak detected by Inspector XE. Please refer to my post in Intel's Inspector XE forum, which includes a simple Fortran program that demonstrates the issue: http://software.intel.com/en-us/forums/topic/508656 It appears once an MPI operation is sending or receiving more than 64K of information an internal buffer may be created and Inspector is reporting a memory leak when that happens. I am hoping there is a way to configure the MPI libraries to alter the behavior of the creation and destruction of internal buffers so I can confirm the source of the reported memory leak. I am hoping someone here in the MPI forums has a suggestion of a way to do this. Even reducing the size of the data transfer that triggers the generation of internal buffers would be helpful. I am reluctant to just write this off ...
How to free a MPI communicator created w MPI_Comm_spawn
By Florentino S.4
Hi, I'm trying to free a communicator created with this call: int MPI_Comm_spawn(char *command, char *argv[], int maxprocs,    MPI_Info info, int root, MPI_Comm comm,    MPI_Comm *intercomm, int array_of_errcodes[]) <-- The comunicator created it's intercommAs far as I know, according to the standard, MPI_Free is a collective operation, although they suggest to implement it locally, however on Intel MPI it's a collective operation (according to my own experience and to http://software.intel.com/sites/products/documentation/hpc/ics/itac/81/I... ). However I have a problem here, father/spawners process/es will have a communicator which contains his sons, and the spawned processes/sons will have the communicator which contains the masters. How I can free the communicator of the master with this layout? I know that I can create a new communicator with both sons and masters and free with that, but then that won't be the same communicator that I want to free. Thanks beforehand,
MPI doesn't work (Fatal error in MPI_Init)
By Ivan I.1
Hi, I have the following problem: I have two nodes and config file: -n 1 -host node0 myapp -n 1 -host node1 myappIn this way it works fine. However If I change the order of lines in config to: -n 1 -host node1 myapp -n 1 -host node0 myappIt fails with the error: Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(658)................: MPID_Init(195).......................: channel initialization failed MPIDI_CH3_Init(104)..................: MPID_nem_tcp_post_init(344)..........: MPID_nem_newtcp_module_connpoll(3102): gen_cnting_fail_handler(1816)........: connect failed - The semaphore timeout period has expired. (errno 121) job aborted: rank: node: exit code[: error message] 0: node1: 1: process 0 exited without calling finalize 1: node0: 123What can be the reason for? Any ideas?
Segfault in DAPL with Mellanox OFED 2.1
By Ben2
Hi, We're having a problem with the Intel MPI library crashing since we've updated to the latest Mellanox OFED 2.1. For example, the test program supplied with Intel MPI (test/test.f90) crashes with a segfault. I compiled it using mpif90 -debug all /apps/intel-mpi/4.1.1.036/test/test.f90 -o test.xand managed to get a back trace from the crash using idbc: #0 0x00007fcb9418f078 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4 #1 0x00007fcb94190bf7 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4 #2 0x00007fcb94191543 in MPID_nem_dapl_rc_init_20 () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4 #3 0x00007fcb941de883 in MPID_nem_dapl_init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4 #4 0x00007fcb94276fc6 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4 #5 0x00007fcb9427547c in MPID_nem_init_ckpt () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4 #6 0x00007fcb94276ca7 in MPID_nem_init () from /apps/intel-mpi/...

Pagine

Iscriversi a Forum
  • What are some key things I can learn about my program using Intel® Trace Analyzer and Collector?
  • The Intel® Trace Analyzer and Collector is a graphical tool used primarily for MPI-based programs. It helps you understand your application's behavior across its full runtime.  It can help find temporal dependencies in your code and communication bottlenecks across the MPI ranks.  It also checks the correctness of your application and points you to potential programming errors, buffer overlaps, and deadlocks.
  • Will Intel® Trace Analyzer and Collector only work with Intel® MPI Library?
  • No, the Intel® Trace Analyzer and Collector support all major MPICH2-based implementations.  If you're wondering whether your MPI library can be profiled using the Intel Trace Analyzer and Collector, you can run a simple ABI-compatibility check by compiling the provided mpiconstants.c file and verifying the values with the ones provided in the Intel® Trace Collector Reference Guide.
  • Can Intel® Trace Analyzer and Collector be used on applications for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)?
  • Yes, Intel® MIC Architecture is fully supported by the Intel® Trace Analyzer and Collector.
  • What file and directory permissions are required to use Intel® Trace Analyzer and Collector?
  • You do not need to install special drivers, kernels, or acquire extra permissions.  Simply install the Intel® Trace Analyzer and Collector in the $HOME directory and link it with your application of choice from there.
  • Should I recompile/relink my application to collect information?
  • It depends on your application. For Windows* OS, you have to relink your application by using the –trace link-time flag.
    For Linux* OS (and if your application is dynamically linked), you do not need to relink or recompile.  Simply use the –trace option at runtime (for example: mpirun –trace).
  • How do I control which part of my application should be profiled?
  • The Intel® Trace Collector provides several options to control the data collection.  By default, only information about MPI calls is collected.  If you'd like to filter which MPI calls should be traced, create a configuration file and set the VT_CONFIG environment variable.

    If you'd like to expand the information collected beyond MPI and include all user-level routines, recompile your application with the –tcollect switch available as part of the Intel® Compilers.  In this case, Intel Trace Collector will gather information about all routines in the application, not just MPI.  You can similarly filter this via the –tcollect-filter compiler option.

    If you'd like to be explicit about which parts of the code should be profiled, use the Intel Trace Collector API calls.  You can manually turn tracing on and off via a quick API call.
    For more Information on all of these methods, refer to the Intel® Trace Collector Reference Guide.
  • What file format is the trace data collected in?
  • Intel® Trace Collector stores all collected data in Structured Tracefile Format (STF) which allows for better scalability across both time and processes.  For more details, refer to the "Structured Tracefile Format" section of Intel® Trace Collector Reference Guide.
  • Can I import or export trace data to/from Intel® Trace Analyzer and Collector?
  • Yes, you can export the data from any of the Profile charts (Function Profile, Message Profile, and Collective Operations Profile) as part of the Intel® Trace Analyzer interface. To do this, open one of these profiles in the GUI, right-click to bring up the Context Menu, and select the "Export Data" option.  The data will be saved in simple text format for easy reading.

    At a separate level, you can save your current working Intel® Trace Analyzer environment via the Project Menu.  If you choose to "Save Project", your current open trace view and associated charts will be recorded as they are open on your screen.  You can later choose to "Load Project" from this same menu, which will bring up a previously-saved session.
  • What size MPI application can I analyze with Intel® Trace Analyzer and Collector?
  • It depends on how large or complex your application is, how many MPI calls you are making, and for how long you are running.  There are no internal limitations on the size of the MPI job but there are plenty of external ones.  It all depends on how much memory is available on the system (per core) both for the application, the MPI library, and for the Intel® Trace Collector processes, as well as disk space availability.  Any additional flags enabled (for example, storing call stack and source code locations) cause an increase in the size of the trace file. Filtering out unimportant information is always a good solution to reducing trace files.
  • How can I control the amount of data collected to a reasonable amount?  What is a reasonable amount?
  • Each application is different in terms of the profiling data it can provide.  The longer an application runs, and the more MPI calls it makes, the larger the STF files will be.  You can filter some of the unnecessary information out by applying appropriate filters (see Question #6 for more details or check out some tips on Intel® Trace Collector Filtering).

    Additionally, you can be restricted by the resources allocated to your account; consult your cluster administration about quotas and recommendations.
  • How can I analyze the collected information?
  • Once you have collected the trace data, you can analyze it via the Graphical Interface called the Intel® Trace Analyzer.  Simply call the command ($ traceanalyzer) or double-click on the Intel Trace Analyzer icon and navigate to your STF files via the File Menu.
    You can get started by opening up the Event Timeline chart (under the Charts Menu) and zooming in at an appropriate level.

    Check out the Detecting and Removing Unnecessary Serialization Tutorial on ideas how to get started. For details on all Intel Trace Analyzer functionality, refer to the Intel® Trace Analyzer Reference Guide.
  • Can I use Intel® Trace Analyzer and Collector with Intel® VTune™ Amplifier XE, Intel® Inspector XE, or other analysis tools?
  • While these tools would collect information separate from each other, in their own format, it's easy enough to use the Intel® VTune™ Amplifier XE and Intel® Inspector XE tools under an MPI environment.  Check each tool's respective User's Guide for more info on Viewing Collected MPI Data.

    You can use tools such as Intel VTune Amplifier XE and Intel Inspector XE for node-level analysis, and use the Intel Trace Analyzer and Collector for cluster-level analysis.

Intel® Trace Analyzer & Collector

Getting Started?

Click the Learn tab for guides and links that will quickly get you started.

Get Help or Advice

Search Support Articles
Forums - The best place for timely answers from our technical experts and your peers. Use it even for bug reports.
Support - For secure, web-based, engineer-to-engineer support, visit our Intel® Premier Support web site. Intel Premier Support registration is required.
Download, Registration and Licensing Help - Specific help for download, registration, and licensing questions.

Resources

Release Notes - View Release Notes online!
Intel® Trace Analyzer and Collector Product Documentation - View documentation online!
Documentation for other software products