# Intel® Trace Analyzer and Collector

Understand MPI application behavior, quickly finding bottlenecks, and achieving high performance for parallel cluster applications

• Powerful MPI Communications Profiling and Analysis
• Scalable - Low Overhead & Effective Visualization
• Flexible to Fit Workflow – Compile, Link or Run

Intel® Trace Analyzer and Collector 9.0 is a graphical tool for understanding MPI application behavior, quickly finding bottlenecks, improving correctness, and achieving high performance for parallel cluster applications based on Intel architecture. Improve weak and strong scaling for small and large applications with Intel Trace Analyzer and Collector.

### Benefits:

• Visualize and understand parallel application behavior
• Evaluate profiling statistics and load balancing
• Analyze performance of subroutines or code blocks
• Learn about communication patterns, parameters, and performance data
• Identify communication hotspots
• Decrease time to solution and increase application efficiency

### MPI checking

• A unique MPI Correctness Checker detects deadlocks, data corruption, and errors with MPI parameters, data types, buffers, communicators, point-to-point messages and collective operations.
• The Correctness Checker allows the user to scale to extremely large systems and detect errors even among a large number of processes.

### Interface and Displays

• Intel® Trace Analyzer and Collector includes full-color customizable GUI with many drill-down view options.
• The analyzer is able to extremely rapidly unwind the call stack and use debug information to map instruction addresses to source code.
• With both command-line and GUI interfaces, the user can additionally set up batch runs or do interactive debugging.

### Scalability

• Low overhead allows random access to portions of a trace, making it suitable for analyzing large amounts of performance data.
• Thread safety allows you to trace multithreaded MPI applications for event-based tracing as well as non-MPI threaded applications.

### Instrumentation and Tracing

• Low-intrusion instrumentation supports MPI applications with C, C++, or Fortran.
• Intel Trace Analyzer and Collector automatically records performance data from parallel threads in C, C++, or Fortran

### What’s new

• MPI Communications Profile Summary Overview
• Quickly Understand Computation vs Communications
• Identify which MPI communications are being most used

• Expanded Standards Support with MPI 3.0
• Automated MPI Communications Analysis with Performance Assistant
• Detect common MPI performance issues
• Automated tips on potential solutions

#### Previously recorded Webinars:

• Increase Cluster MPI Application Performance with a "MPI Tune" Up
• MPI on Intel® Xeon Phi™ coprocessor
• Quickly discover performance issues with the Intel® Trace Analyzer and Collector 9.0 Beta

No Content Found

No Content Found

## Supplemental Documentation

No Content Found

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

Problems reading HDF5 files with IntelMPI?
0
Hi, is anyone aware of troubles with PHDF5 and IntelMPI? A test code to reads an HDF5 file in parallel has trouble when scaling if I run it with IntelMPI, but no trouble if I run it, for example, with POE. I'm using Intel compilers 13.0.1, IntelMPI 4.1.3.049, and HDF5 1.8.10 The code just reads a 800x800x800 HDF5 file, and the times I get for reading it are: 128 procs  - 0.7262E+01 1024 procs - 0.9815E+01 1280 procs - 0.9930E+01 1600 procs - ???????  (it gest stalled...) But the same code (compiled with the above modules), but submitted with IBM's POE instead of IntelMPI has no trouble with 1600 procs (actually no trouble at all with up to 4096 procs) and it reads the file in 0.8963E+01 secs. Any help appreciated,
Intel MPI issue with the usage of Slurm
2
To whom it may concern, Hello. We are using Slurm to manage our Cluster. However, we met a new issue of Intel MPI with Slurm. When one node reboots, the Intel MPI will fail with that node but manaully restart of slurm daemon will fix it. I also tried to add "service slurm restart" in /etc/rc.local which runs in the end of booting but the issue is still there. Moreover, I submitted this issue to the slurm-dev but they believed that it was due to Infiniband+IMPI configuration. They suggested me to configure dat.conf and set up some Intel MPI variables. However, I don't know how to set them. Here is an example: $salloc -N1 -n12 -w cn117 #cn117 is the node just rebooted salloc: Granted job allocation 1201$ module list Currently Loaded Modulefiles: 1) modules 2) null 3) intelics/2013.1.039 $export I_MPI_PMI_LIBRARY=/gpfs/slurm/lib/libpmi.so$ export I_MPI_FABRICS=shm:ofa $srun ./hello [3] MPI startup(): ofa fabric is not available and fallba... [6] Assertion failed in file ../../segment.c 2 Hi, we have compiled our parallel code by using the latest Intel's software stack. We do use a lot of passive RMA one-sided PUT/GET operations along with a derived datatypes. Now we are expericincing problem that sometimes our application fails with either segmentation fault or with the following error message: [6] Assertion failed in file ../../segment.c at line 669: cur_elmp->curcount >= 0 [6] internal ABORT - process 6 The Intel's inspector shows a problem inside the Intel MPI library: libmpi_dbg.so.4!MPID_Segment_blkidx_m2m - segment_packunpack.c:313 libmpi_dbg.so.4!MPID_Segment_manipulate - segment.c:552 libmpi_dbg.so.4!MPID_Segment_unpack - segment_packunpack.c:88 libmpi_dbg.so.4!MPIDI_CH3U_Receive_data_found - ch3u_handle_recv_pkt.c:190 libmpi_dbg.so.4!MPIDI_CH3_PktHandler_GetResp - ch3u_rma_sync.c:3691 libmpi_dbg.so.4!MPID_nem_handle_pkt - ch3_progress.c:1477 libmpi_dbg.so.4!MPIDI_CH3I_Progress - ch3_progress.c:498 libmpi_dbg.so.4!MPIDI_Win_unlock - c... -perhost not working with IntelMPI v.5.0.1.035 1 -perhost option does not work as expected with IntelMPI v.5.0.1.035, though it does work with IntelMPI v.4.1.0.024:$ qsub -I -lnodes=2:ppn=16:compute,walltime=0:15:00 qsub: waiting for job 5731.hpc-class.its.iastate.edu to start qsub: job 5731.hpc-class.its.iastate.edu ready $mpirun -n 2 -perhost 1 uname -n hpc-class-40.its.iastate.edu hpc-class-40.its.iastate.edu$ export I_MPI_ROOT=/shared/intel//impi/4.1.0.024 $PATH="${I_MPI_ROOT}/intel64/bin:${PATH}"; export PATH$ mpirun -n 2 -perhost 1 uname -n hpc-class-40.its.iastate.edu hpc-class-39.its.iastate.edu   I also ran the same commands with I_MPI_HYDRA_DEBUG set to 1 (see attached files mpirun-perhost.txt and mpirun-perhost-4.1.0.024.txt). Note that the first two lines of the output in mpirun-perhost.txt suggest that -perhost works (two different hostnames are printed), but at the end it's still printing the same hostname twice.   In mpirun-perhost.txt I_MPI_PERHOST said to be allcores. In another run (see attached f...
Problem in running mpirun command through newly created user
0
Hi Team, I am facing a problem while running mpirun command through newly created user. Package ID: l_mpi_p_4.0.3.008 Any help will be highly appreciated. Regards.
Intel MPI 5.0.1.037
4
Hi, I have  two questions about Intel MPI on Micorsoft Windows 7 64bit. The first one concerning about Intel MPI 5.0.1.037. If I execute C:\Program Files (x86)\Intel\MPI-RT\5.0.1.037\em64t\bin\smpd -version I get "3.1". If I execute C:\Program Files (x86)\Intel\MPI-RT\4.1.3.045\em64t\bin\smpd -version I get "4.1.3" This is a problem for us, because our product is compiled with Intel MPI 4.1.3.045 and needs smpd version 4.1.3. After updating to runtime Intel MPI 5.0.1.037 we get an smpd mismatch error! The same issue is with the hydra service. Why is the C:\Program Files (x86)\Intel\MPI-RT\5.0.1.037\em64t\bin\smpd version 3.1 and not 5.0.1?   The second question is: does Intel MPI 4.1.3.045 / 5.0.1.037 support Windows 8? Thank you in advance
Fortran code aborts beyond 108 nodes
0
Hi, There is an issue we have been facing for the past few months. We used to use a C code for our simulations. It used to run successfully on 108 nodes (each node has 16 processors), but we could not make the code run on more than 108 nodes. Right now i am using a Fortran 90 code (it is just a fortran version of the above C code - both C and F90 codes have the same functionality) which runs successfully even on 256 nodes ie 4096 processors --- but the success is limited. When i try to write binary data from each individual processor, errors crop up after some number of processors write data. Data is written properly when i use only 108 nodes. If i do not write outputs, then the fortran code executes properly even on 256 nodes. The errors start only when the data write process begins. The code then aborts after some data is written. The machine has Dual Intel Xeon E5-2670 8 core processors at 2.6GhZ, Linux OS. The intel compiler version is intel-cluster-studio-2013. We use Intel mpi...
cpuinfo output from system call different
2
Hello, I'm using Intel MPI 5.0 and am making a system call inside my fortran program and it returns different values depending on the env. variable I_MPI_PIN_DOMAIN. Why is that? How do I make it give consistent output? Sample Fortran (Intel Fortran 13.1) program that can reproduce this: Program tester call system("cpuinfo|grep 'Packages(sockets)'| & tr -d ' '|cut -d ':' -f 2") stop end  $mpirun -genv I_MPI_PIN_DOMAIN node -np 1 ./a.out 2$ mpirun -genv I_MPI_PIN_DOMAIN socket -np 1 ./a.out 1 The command line output of the same in a shell is 2, so why is "socket" giving different output? Thanks!
• What are some key things I can learn about my program using Intel® Trace Analyzer and Collector?
• The Intel Trace Analyzer and Collector is a graphical tool used primarily for MPI-based programs. It helps you understand your application's behavior across its full runtime. It can help find temporal dependencies in your code and communication bottlenecks across the MPI ranks. It also checks the correctness of your application and points you to potential programming errors, buffer overlaps, and deadlocks.

• Will Intel Trace Analyzer and Collector only work with Intel MPI Library?
• No, the Intel Trace Analyzer and Collector support all major MPICH2-based implementations. If you're wondering whether your MPI library can be profiled using the Intel Trace Analyzer and Collector, you can run a simple ABI-compatibility check by compiling the provided mpiconstants.c file and verifying the values with the ones provided in the Intel Trace Collector Reference Guide..

• Can Intel Trace Analyzer and Collector be used on applications for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)?
• Yes, Intel MIC Architecture is fully supported by the Intel Trace Analyzer and Collector.

• What file and directory permissions are required to use Intel Trace Analyzer and Collector?
• You do not need to install special drivers, kernels, or acquire extra permissions. Simply install the Intel Trace Analyzer and Collector in the $HOME directory and link it with your application of choice from there. • Should I recompile/relink my application to collect information? • It depends on your application. For Windows* OS, you have to relink your application by using the –trace link-time flag. For Linux* OS (and if your application is dynamically linked), you do not need to relink or recompile. Simply use the –trace option at runtime (for example: mpirun –trace). • How do I control which part of my application should be profiled? • The Intel Trace Collector provides several options to control the data collection. By default, only information about MPI calls is collected. If you'd like to filter which MPI calls should be traced, create a configuration file and set the VT_CONFIG environment variable. If you'd like to expand the information collected beyond MPI and include all user-level routines, recompile your application with the –tcollect switch available as part of the Intel® Compilers. In this case, Intel Trace Collector will gather information about all routines in the application, not just MPI. You can similarly filter this via the –tcollect-filter compiler option. If you'd like to be explicit about which parts of the code should be profiled, use the Intel Trace Collector API calls. You can manually turn tracing on and off via a quick API call. For more Information on all of these methods, refer to the Intel Trace Collector Reference Guide.. • What file format is the trace data collected in? • Intel Trace Collector stores all collected data in Structured Tracefile Format (STF) which allows for better scalability across both time and processes. For more details, refer to the "Structured Tracefile Format" section of Intel Trace Collector Reference Guide. • Can I import or export trace data to/from Intel Trace Analyzer and Collector? • Yes, you can export the data from any of the Profile charts (Function Profile, Message Profile, and Collective Operations Profile) as part of the Intel Trace Analyzer interface. To do this, open one of these profiles in the GUI, right-click to bring up the Context Menu, and select the "Export Data" option. The data will be saved in simple text format for easy reading. At a separate level, you can save your current working Intel Trace Analyzer environment via the Project Menu. If you choose to "Save Project", your current open trace view and associated charts will be recorded as they are open on your screen. You can later choose to "Load Project" from this same menu, which will bring up a previously-saved session. • What size MPI application can I analyze with Intel Trace Analyzer and Collector? • It depends on how large or complex your application is, how many MPI calls you are making, and for how long you are running. There are no internal limitations on the size of the MPI job but there are plenty of external ones. It all depends on how much memory is available on the system (per core) both for the application, the MPI library, and for the Intel Trace Collector processes, as well as disk space availability. Any additional flags enabled (for example, storing call stack and source code locations) cause an increase in the size of the trace file. Filtering out unimportant information is always a good solution to reducing trace files. • How can I control the amount of data collected to a reasonable amount? What is a reasonable amount? • Each application is different in terms of the profiling data it can provide. The longer an application runs, and the more MPI calls it makes, the larger the STF files will be. You can filter some of the unnecessary information out by applying appropriate filters (see Question #6 for more details or check out some tips on Intel Trace Collector Filtering). Additionally, you can be restricted by the resources allocated to your account; consult your cluster administration about quotas and recommendations. • How can I analyze the collected information? • Once you have collected the trace data, you can analyze it via the Graphical Interface called the Intel Trace Analyzer. Simply call the command ($ traceanalyzer) or double-click on the Intel Trace Analyzer icon and navigate to your STF files via the File Menu.

You can get started by opening up the Event Timeline chart (under the Charts Menu) and zooming in at an appropriate level.

Check out the Detecting and Removing Unnecessary Serialization Tutorial on ideas how to get started. For details on all Intel Trace Analyzer functionality, refer to the Intel Trace Analyzer Reference Guide.

• Can I use Intel Trace Analyzer and Collector with Intel® VTune™ Amplifier XE, Intel® Inspector XE, or other analysis tools?
• While these tools would collect information separate from each other, in their own format, it's easy enough to use the Intel VTune Amplifier XE and Intel Inspector XE tools under an MPI environment. Check each tool's respective User's Guide for more info on Viewing Collected MPI Data.

You can use tools such as Intel VTune Amplifier XE and Intel Inspector XE for node-level analysis, and use the Intel Trace Analyzer and Collector for cluster-level analysis.

## Intel® Trace Analyzer & Collector

### Getting Started?

Click the Learn tab for guides and links that will quickly get you started.

Search Support Articles
Forums - The best place for timely answers from our technical experts and your peers. Use it even for bug reports.
Support - For secure, web-based, engineer-to-engineer support, visit our Intel® Premier Support web site. Intel Premier Support registration is required.