Intel® Trace Analyzer and Collector

Understand MPI application behavior, quickly finding bottlenecks, and achieving high performance for parallel cluster applications

  • Powerful MPI Communications Profiling and Analysis
  • Scalable - Low Overhead & Effective Visualization
  • Flexible to Fit Workflow – Compile, Link or Run

Intel® Trace Analyzer and Collector 9.0 is a graphical tool for understanding MPI application behavior, quickly finding bottlenecks, improving correctness, and achieving high performance for parallel cluster applications based on Intel architecture. Improve weak and strong scaling for small and large applications with Intel Trace Analyzer and Collector.

Benefits:

  • Visualize and understand parallel application behavior
  • Evaluate profiling statistics and load balancing
  • Analyze performance of subroutines or code blocks
  • Learn about communication patterns, parameters, and performance data
  • Identify communication hotspots
  • Decrease time to solution and increase application efficiency

MPI checking

  • A unique MPI Correctness Checker detects deadlocks, data corruption, and errors with MPI parameters, data types, buffers, communicators, point-to-point messages and collective operations.
  • The Correctness Checker allows the user to scale to extremely large systems and detect errors even among a large number of processes.

Interface and Displays

  • Intel® Trace Analyzer and Collector includes full-color customizable GUI with many drill-down view options.
  • The analyzer is able to extremely rapidly unwind the call stack and use debug information to map instruction addresses to source code.
  • With both command-line and GUI interfaces, the user can additionally set up batch runs or do interactive debugging.

Scalability

  • Low overhead allows random access to portions of a trace, making it suitable for analyzing large amounts of performance data.
  • Thread safety allows you to trace multithreaded MPI applications for event-based tracing as well as non-MPI threaded applications.

Instrumentation and Tracing

  • Low-intrusion instrumentation supports MPI applications with C, C++, or Fortran.
  • Intel Trace Analyzer and Collector automatically records performance data from parallel threads in C, C++, or Fortran

What’s new

  • MPI Communications Profile Summary Overview
    • Quickly Understand Computation vs Communications
    • Identify which MPI communications are being most used
    • Advice of where to start your analysis

  • Expanded Standards Support with MPI 3.0
    • Automated MPI Communications Analysis with Performance Assistant
    • Detect common MPI performance issues
    • Automated tips on potential solutions

Videos to help you get started.

Register for future Webinars


Previously recorded Webinars:

  • Increase Cluster MPI Application Performance with a "MPI Tune" Up
  • MPI on Intel® Xeon Phi™ coprocessor
  • Quickly discover performance issues with the Intel® Trace Analyzer and Collector 9.0 Beta

More Tech Articles

Using Intel® MPI Library 5.0 with MPICH based applications
Por Dmitry Sivkov (Intel)Publicado em 08/25/20140
Why it is needed? Different MPI implementations have their specific benefits and advantages. So in the specific cluster environment the HPC application with the other MPI implementation can probably perform better.  Intel® MPI Library has the following benefits: Support of the wide range of cl...
Intel® Cluster Tools Open Source Downloads
Por Gergana Slavova (Intel)Publicado em 03/06/20140
This article makes available third-party libraries and sources that were used in the creation of Intel® Software Development Products. Intel provides this software pursuant to their applicable licenses. Products and Versions: Intel® Trace Analyzer and Collector for Linux* gcc-3.2.3-42.zip (whi...
Using the Intel® MPI Library on Intel® Xeon Phi™ Coprocessor Systems
Por loc-nguyen (Intel)Publicado em 03/19/201316
Download Article Download Using the Intel® MPI Library on Intel® Xeon Phi™ Coprocessor Systems [PDF 499KB] Table of Contents Chapter 1 – Introduction 1.1 – Overview 1.2 – Compatibility Chapter 2 – Installing the Intel® MPI Library 2.1 – Installing the Intel MPI Library 2.2 – Preparation Chapter 3...
Intel® Trace Collector Filtering
Por James Tullos (Intel)Publicado em 03/14/20130
Filtering in the Intel® Trace Collector will apply specified filters to the trace collection process.  This directly reduces the amount of data collected.  The filter rules can be applied either via command line arguments or in a configuration file (specified by the environment variable VT_CONFIG...
Assine o Artigos do Espaço do desenvolvedor Intel

Supplemental Documentation

Intel® Parallel Studio XE 2015 Update 2 Cluster Edition Readme
Por Gergana Slavova (Intel)Publicado em 02/06/20150
The Intel® Parallel Studio XE 2015 Update 2 Cluster Edition for Linux* and Windows* combines all Intel® Parallel Studio XE and Intel® Cluster Tools into a single package. This multi-component software toolkit contains the core libraries and tools to efficiently develop, optimize, run, and distrib...
Intel® Parallel Studio XE 2015 Update 1 Cluster Edition Readme
Por Gergana Slavova (Intel)Publicado em 11/24/20140
The Intel® Parallel Studio XE 2015 Update 1 Cluster Edition for Linux* and Windows* combines all Intel® Parallel Studio XE and Intel® Cluster Tools into a single package. This multi-component software toolkit contains the core libraries and tools to efficiently develop, optimize, run, and distrib...
Intel® Parallel Studio XE 2015 Cluster Edition Initial Release Readme
Por Gergana Slavova (Intel)Publicado em 08/15/20140
The Intel® Parallel Studio XE 2015 Cluster Edition for Linux* and Windows* combines all Intel® Parallel Studio XE and Intel® Cluster Tools into a single package. This multi-component software toolkit contains the core libraries and tools to efficiently develop, optimize, run, and distribute paral...
Intel® Trace Analyzer and Collector 8.1 Update 3 Readme
Por Gergana Slavova (Intel)Publicado em 08/13/20130
The Intel® Trace Analyzer and Collector 8.1 Update 3 for Linux* and Windows* is a low-overhead scalable event-tracing library with graphical analysis that reduces the time it takes an application developer to enable maximum performance of cluster applications. This package is for users who dev...
Assine o Artigos do Espaço do desenvolvedor Intel

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


MPI equivalent of KMP_PLACE_THREADS on MIC
Por Pramod K.1
Hello All, When I run pure OpenMP example on MIC, I find KMP_PLACE_THREADS very useful. (for example, I can benchmark using 8 cores and 3 threads on every core with "KMP_PLACE_THREADS=8c,3t,0O" What is MPI equivalent for this? (I am running pure mpi application natively on MIC with Intel MPI)  In the documentation I see I_MPI_PIN_PROCESSOR_LIST where I can provide the list of specific processors. Is there any other way? Thanks.
Intel ITAC error
Por Nitin Kundapur B.0
Hello, I am using Intel Traceanalyzer to profile an application called CESM. Since I want to profile the user defined functions, I instrument the code using the -tcollect option during compilation. This happens successfully.  After compilation, I use the -trace option in the mpirun command.  After the run is completed, I see that the several trace files are generated.  When I open the $application.stf file (the appropriate stf file), I get the following error in the Intel Traceanalyzer.  "The file $application.stf cannot not be read. Check name, permissions, whether this is really a trace file and the trace is valid. Also check that all parts of the trace file, if they exist, are located in the same directory." Please find attached the snapshot of the error which I receive when I open traceanalyzer.    What might be the probable causes for this? The generated insturmented code is extremely large as the job (with instrumentation) took 6 hours to run.    
Does n550 cpu support parallel programming(mpi)?
Por elifnur k.0
Hello everyone, i have samsung n150 plus netbook. it has intel n550 cpu. i want to develop some mpi programs with this device but i can't install MPICH libraries or any others. When i tried, i get warning like this. 'This installation package is not supported by this processor type. Contact the product vendor.'   But when i searched, learned that n550 has 2 cores. What is the problem, i can't solve it. Is there any program(or supporting thing) for this processor (n550), using for parallel programming?   i asked this question before in processors title.    https://communities.intel.com/thread/61782   Thanks for your helps and times already.  :)  
Problem: Sending more than two process
Por Rodrigo Antonio F.0
Hi, I have faced with a problem when my program try to send structure data more than two process. I created data structure mpi_docking_t as below  typedef struct s_docking{ char receptor[MAX_FILE_NAME]; char compound[MAX_FILE_NAME]; }docking_t; /************* mpi_docking_t ***************************/ const int nitems_dock=2; int blocklengths_dock[nitems_dock] = {MAX_PATH, MAX_PATH}; MPI_Datatype types_dock[nitems_dock] = {MPI_CHAR, MPI_CHAR}; MPI_Aint offsets_dock[nitems_dock]; offsets_dock[0] = offsetof(docking_t, receptor); offsets_dock[1] = offsetof(docking_t, compound); MPI_Type_create_struct(nitems_dock, blocklengths_dock, offsets_dock, types_dock, &mpi_docking_t); MPI_Type_commit(&mpi_docking_t); /************* mpi_docking_t end ***************************/i tried to send data based on this code: //Preparing buffer to be sent docking_t *buff = NULL; int buffer_dock; buffer_dock = sizeof(docking_t)*number_dock*MPI_BSEND_OVERHEAD; ...
Profiling a complex MPI Application : CESM (Community Earth System Model)
Por Nitin Kundapur B.4
Hello.  CESM is a complex MPI climate model which is a highly parallel application.  I am looking for ways to profile CESM runs. The default profiler provides profiling data for only a few routines. I have tried using external profilers like TAU, HPC Toolkit, Allinea Map, ITAC Traceanalyzer and VTune.  As I was running CESM across a cluster (with 8 nodes - 16 processors each), it was most beneficial to use HPC Toolkit and Allinea Map for profiling. However, I am keen on finding two metrics for each CESM routine executed.  These are : 1) Total execution time of the function 2) Number of function calls made Both of these do not provide the number of function calls made for a routine.  The number of function calls made is important because this will help me find the time taken for execution of each call of a function. Just wanted to know if this has been achieved by anyone. Is there a way to do this with any of these tools?    Thanks, Nitin K Bhat SERC, Indian Institute of Science
Performance issues of Intel MPI 5.0.2.044 on Windows 7 SP 1 with 2x18 cores cpus.
Por Frank R.2
Dear support team, I have a question about a performance difference between Windows 7 SP 1 and RHEL 6.5. The situation is as follows: The hardware is a DELL precision rack 7910, see link for exact specification (click on components):http://www.dell.com/support/home/us/en/19/product-support/servicetag/3X8GG42/configuration We installed Linux RHEL 6.5 on this machine and ran our product (compiled with Intel C/C++/Fortran 13.1.3 (gcc version 4.4.7 compatibility) and Intel MPI 5.0.2.044 on Linux). After that, we installed Windows 7 SP 1 on this machine and ran our product (compiled with Intel C/C++/Fortran 13.1.3.198 and Intel MPI 5.0.2.044 on Windows) again. What we observed is a big performance drop on 1 and 2 cpu on Windows in comparison to Linux. If we go up to 8, 16, 32 cpus we got nearly the same performance on Windows as on Linux, but we got heavy oscillation in computation time only on Windows (sometimes 16 cpus faster than 32 cpu). On Intel MPI 4.1.3.045 we didn't see this os...
MPI_Init_thread or MPI_Init failed in child process
Por Yongjun L.7
I have two programs, A and B. They all are developed with MPI. A will call B.  If I directly start A and call B, every thing is OK. If I start A with mpiexec, like mpiexec -localonly 2 A.exe, and call B. MPI_Init_thread or MPI_Init will fail in B.  Below is the error message I got. [01:2668]..ERROR:Error while connecting to host, No connection could be made because the target machine actively refused it. (10061) [01:2668]..ERROR:Connect on sock (host=localhost, port=53649) failed, exhaused all end points SMPDU_Sock_post_connect failed. [1] PMI_ConnectToHost failed: unable to post a connect to localhost:53649, error: Undefined dynamic error code uPMI_ConnectToHost returning PMI_FAIL [1] PMI_Init failed. Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(659): MPID_Init(154).......: channel initialization failed MPID_Init(448).......: PMI_Init returned -1 Can anyone tell me what is the problem? How to solve it? Thanks Yongjun
need to type "Enter" ?
Por dingjun.chencmgl.ca1
Hi, Everyone, I am running my hybrid MPI/OpenMP jobs on 3-nodes Infiniband PCs Linux cluster. each node has one MPI process that has 15 OpenMP threads. This means my job runs with 3 MPI processes and each MPI process has 15 threads. the hosts.txt file is given as follows: coflowrhc4-5:1 coflowrhc4-6:1 coflowrhc4-7:1  I wrote the following batch file as follows: /************** batch file******************/ export CMG_LIC_HOST=rlmserv export exe=/cmg/dingjun/imexLocal/imex_xsamg_dave.exe export LD_LIBRARY_PATH=/cmg/dingjun/imexLocal/linux_x64/lib export OMP_SCHEDULE=static,1 export KMP_AFFINITY=compact,0 export datadir=/cmg/dingjun/imexdatasets/7testproblems/mx1041_rb cd /cmg/dingjun/imexdatasets/7testproblems/mx1041_rb mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/mx1041x105x10loa2_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o mx1041x105x10loa2_rb_xsamg_3MPI15threads_run7 export datadir=/cmg/dingjun/imexdatasets/7testproblems/mx521_rb cd /cmg/dingjun/imexdataset...
Assine o Fóruns
  • What are some key things I can learn about my program using Intel® Trace Analyzer and Collector?
  • The Intel Trace Analyzer and Collector is a graphical tool used primarily for MPI-based programs. It helps you understand your application's behavior across its full runtime. It can help find temporal dependencies in your code and communication bottlenecks across the MPI ranks. It also checks the correctness of your application and points you to potential programming errors, buffer overlaps, and deadlocks.

  • Will Intel Trace Analyzer and Collector only work with Intel MPI Library?
  • No, the Intel Trace Analyzer and Collector support all major MPICH2-based implementations. If you're wondering whether your MPI library can be profiled using the Intel Trace Analyzer and Collector, you can run a simple ABI-compatibility check by compiling the provided mpiconstants.c file and verifying the values with the ones provided in the Intel Trace Collector Reference Guide..

  • Can Intel Trace Analyzer and Collector be used on applications for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)?
  • Yes, Intel MIC Architecture is fully supported by the Intel Trace Analyzer and Collector.

  • What file and directory permissions are required to use Intel Trace Analyzer and Collector?
  • You do not need to install special drivers, kernels, or acquire extra permissions. Simply install the Intel Trace Analyzer and Collector in the $HOME directory and link it with your application of choice from there.

  • Should I recompile/relink my application to collect information?
  • It depends on your application. For Windows* OS, you have to relink your application by using the –trace link-time flag.

    For Linux* OS (and if your application is dynamically linked), you do not need to relink or recompile. Simply use the –trace option at runtime (for example: mpirun –trace).

  • How do I control which part of my application should be profiled?
  • The Intel Trace Collector provides several options to control the data collection. By default, only information about MPI calls is collected. If you'd like to filter which MPI calls should be traced, create a configuration file and set the VT_CONFIG environment variable.

    If you'd like to expand the information collected beyond MPI and include all user-level routines, recompile your application with the –tcollect switch available as part of the Intel® Compilers. In this case, Intel Trace Collector will gather information about all routines in the application, not just MPI. You can similarly filter this via the –tcollect-filter compiler option.

    If you'd like to be explicit about which parts of the code should be profiled, use the Intel Trace Collector API calls. You can manually turn tracing on and off via a quick API call.

    For more Information on all of these methods, refer to the Intel Trace Collector Reference Guide..

  • What file format is the trace data collected in?
  • Intel Trace Collector stores all collected data in Structured Tracefile Format (STF) which allows for better scalability across both time and processes. For more details, refer to the "Structured Tracefile Format" section of Intel Trace Collector Reference Guide.

  • Can I import or export trace data to/from Intel Trace Analyzer and Collector?
  • Yes, you can export the data from any of the Profile charts (Function Profile, Message Profile, and Collective Operations Profile) as part of the Intel Trace Analyzer interface. To do this, open one of these profiles in the GUI, right-click to bring up the Context Menu, and select the "Export Data" option. The data will be saved in simple text format for easy reading.

    At a separate level, you can save your current working Intel Trace Analyzer environment via the Project Menu. If you choose to "Save Project", your current open trace view and associated charts will be recorded as they are open on your screen. You can later choose to "Load Project" from this same menu, which will bring up a previously-saved session.

  • What size MPI application can I analyze with Intel Trace Analyzer and Collector?
  • It depends on how large or complex your application is, how many MPI calls you are making, and for how long you are running. There are no internal limitations on the size of the MPI job but there are plenty of external ones. It all depends on how much memory is available on the system (per core) both for the application, the MPI library, and for the Intel Trace Collector processes, as well as disk space availability. Any additional flags enabled (for example, storing call stack and source code locations) cause an increase in the size of the trace file. Filtering out unimportant information is always a good solution to reducing trace files.

  • How can I control the amount of data collected to a reasonable amount? What is a reasonable amount?
  • Each application is different in terms of the profiling data it can provide. The longer an application runs, and the more MPI calls it makes, the larger the STF files will be. You can filter some of the unnecessary information out by applying appropriate filters (see Question #6 for more details or check out some tips on Intel Trace Collector Filtering).

    Additionally, you can be restricted by the resources allocated to your account; consult your cluster administration about quotas and recommendations.

  • How can I analyze the collected information?
  • Once you have collected the trace data, you can analyze it via the Graphical Interface called the Intel Trace Analyzer. Simply call the command ($ traceanalyzer) or double-click on the Intel Trace Analyzer icon and navigate to your STF files via the File Menu.

    You can get started by opening up the Event Timeline chart (under the Charts Menu) and zooming in at an appropriate level.

    Check out the Detecting and Removing Unnecessary Serialization Tutorial on ideas how to get started. For details on all Intel Trace Analyzer functionality, refer to the Intel Trace Analyzer Reference Guide.

  • Can I use Intel Trace Analyzer and Collector with Intel® VTune™ Amplifier XE, Intel® Inspector XE, or other analysis tools?
  • While these tools would collect information separate from each other, in their own format, it's easy enough to use the Intel VTune Amplifier XE and Intel Inspector XE tools under an MPI environment. Check each tool's respective User's Guide for more info on Viewing Collected MPI Data.

    You can use tools such as Intel VTune Amplifier XE and Intel Inspector XE for node-level analysis, and use the Intel Trace Analyzer and Collector for cluster-level analysis.

Intel® Trace Analyzer & Collector

Getting Started?

Click the Learn tab for guides and links that will quickly get you started.

Get Help or Advice

Search Support Articles
Forums - The best place for timely answers from our technical experts and your peers. Use it even for bug reports.
Support - For secure, web-based, engineer-to-engineer support, visit our Intel® Premier Support web site. Intel Premier Support registration is required.
Download, Registration and Licensing Help - Specific help for download, registration, and licensing questions.

Resources

Release Notes - View Release Notes online!
Intel® Trace Analyzer and Collector Product Documentation - View documentation online!
Documentation for other software products