Processor Tuning Guides

Learn how to perform advanced tuning for specific microarchitectures.

Profile Python* with Intel® VTune™ Amplifier

This Python* tuning demonstration uses covariance implementations built into NumPy and the Intel® Data Analytics Acceleration Library. It includes code snippets.

Use Intel® Advisor & Intel VTune Amplifier with Message Passing Interface (MPI)

Get instructions for collecting performance data for MPI and hybrid MPI plus thread codes in a Linux* environment. It provides flexibility for profiling all ranks or just a subset.

Webinars

With Intel® Performance Libraries, learn which conditions that make it possible to build serial and parallel applications that deliver repeatable results. (61:11 min)

Find out how Intel® VTune™ Amplifier monitors HPC workloads and generates reports to optimize various platform components. (56 min)

See how to tune and accelerate compute-intensive performance with the features and architecture of Intel® Xeon® Scalable processors and Intel® Software Development Tools.

Locate performance and scalability issues, and identify whether imbalance, lock contention, creation overhead, or scheduling overhead causes them.

Identify where code performance needs to improve and learn how to fix it.

Overcome small matrix multiplication challenges and enable optimizations for matrix-matrix multiplication using Intel® Math Kernel Library.

Use key components in Intel® System Studio correct hot spots, power inefficiencies, memory leaks, non-optimized threads, and other system issues.

Application Performance Snapshot

See a demonstration of the Application Performance Snapshot. Quickly discover untapped performance and make the best use of your computer hardware. (20:27 min)

Analysis of Python Applications

This video discusses the needs, advantages, and common tools and techniques used to profile Python applications. It includes a demo and code sample. (47:28 min)

How N U M A Affects Workloads

In multisocket non-uniform memory access (NUMA) systems, get the best performance through memory object placement on the memory subsystem. (58:39 min)

Performance Tune-Up for Your Hybrid Program

Find the root causes of why a hybrid application may not be performing as expected using this walk-through of uncovering and resolving each issue. (43:49 min)

Strategies for Tuning Multilevel Parallelism

Learn where to add parallelism to your application and determine how scalable it can be. (57:41 min)

Performance Analysis Cookbooks

Discover the benefits of analysis techniques using Intel VTune Amplifier. Use recipes that explore performance tuning use cases and environment-specific configurations.

False Sharing

Explore profiling a memory-bound linear-regression application using the General Exploration and Memory Access analyses.

I/O Issues: Remote Socket Accesses

Analyze an application that's based on the Data Plane Developer Kit (DPDK) for potential misconfiguration problems on a multisocket system using the General Exploration analysis.

OpenMP* Imbalance & Scheduling Overhead

Detect and fix frequent parallel bottlenecks of OpenMP programs, such as imbalance on barriers and scheduling overhead.

Profile a Java* Application in a Docker* Container

Configure a Docker* container for the Intel VTune Amplifier analysis to identify hot spots in a Java* application running in the isolated container environment.

Profile JavaScript* Code in Node.js*

Rebuild Node.js* and enable the performance analysis for JavaScript* code, including mixed-mode call stacks containing JavaScript and native frames.

Profile a Microsoft .NET Core Application

Use Intel VTune Amplifier for .NET core dynamic-code profiling to locate performance hot spots in the managed code and optimize the application turnaround.

More Cookbook Recipes

Training Samples

Learn how to use Intel VTune Amplifier with prewritten sample code projects.

Installation

To install and set up the Intel VTune Amplifier sample code:

  1. Copy the .zip file from the installation directory to a writable directory or share it on your system.
  2. Extract the sample from the archive.

Notes:

  • The samples are nondeterministic. Your screens may vary from the screenshots shown throughout these tutorials.
  • The samples are designed only to illustrate Intel VTune Amplifier features and do not represent best practices for tuning any particular code. Results may vary depending on the nature of the analysis and the code to which it is applied.

 

Samples

Name Description
tachyon_find_hotspots
  • Description: 2D ray tracer and renderer that shows how Intel VTune Amplifier analysis helps identify hot spots and performance bottlenecks in a C++ application
  • Performance issues addressed: Wrong algorithm choice and ineffective parallelization
  • Location: \samples\<locale>\C++\tachyon_vtune_amp_xe.zip
tachyon_analyze_locks
  • Description: Identifies locks preventing efficient parallelism in a C++ application
  • Performance issues addressed: Locks and waits
  • Location: \samples\<locale>\C++\tachyon_vtune_amp_xe.zip
matrix
  • Description: Calculates matrix transformations and identifies general hardware issues in a C++ application on the host system and on an Intel® Xeon Phi™ coprocessor
  • Performance issues addressed: Poor cycles per instruction (CPI) rate, cache misses, retire stalls, execution stalls, and others
  • Location: \samples\<locale>\C++\matrix_vtune_amp_xe.zip
nqueens_parallel
  • Description: Solves the n-queens problem for various board sizes and identifies general hardware issues in a Fortran application
  • Performance issues addressed: Thread contention and ineffective parallelization
  • Location: \samples\<locale>\Fortran\nqueens_parallel.zip
serial_nqueens_csharp & parallel_nqueens_csharp
  • Description: Computes the number of solutions to the n-queens problem for a given board size and identifies hot spots in a Microsoft Visual C#* application
  • Performance issues addressed: Demonstrates basic performance analysis
  • Location: \samples\<locale>\C#\serial_nqueens_csharp.zip
jitprofiling
  • Description: Example of instrumenting an application with the JIT profiling API (This API is typically useful for users with their own compilers or where code is generated dynamically.)
  • Performance issues addressed: Demonstrates profiling a JIT compiled application
  • Location: \samples\<locale>\C++\jitprofiling_vtune_amp_xe.zip