Processor Tuning Guides

Learn how to perform advanced tuning for specific microarchitectures.

Profile Python* with Intel® VTune™ Amplifier

This Python* tuning demonstration uses covariance implementations built into NumPy and the Intel® Data Analytics Acceleration Library. It includes code snippets.

Use Intel® Advisor & Intel VTune Amplifier with Message Passing Interface (MPI)

Get step-by-step instructions for collecting performance data for MPI and hybrid MPI plus thread codes in a Linux* environment. It provides flexibility for profiling all ranks or just a subset.

Webinars

Application Performance Snapshot

See a demonstration of the Application Performance Snapshot. It offers fast ways to discover untapped performance and make the best use of your computer hardware. (20:27 min)

Analysis of Python Applications

This video discusses the needs, advantages, and common tools and techniques for profiling Python applications. It includes a demo and code sample. (47:28 min)

How N U M A Affects Workloads

In multisocket non-uniform memory access (NUMA) systems, get the best performance through memory object placement on the memory subsystem. (58:39 min)

Performance Tune-Up for Your Hybrid Program

Are you working with a hybrid program that just isn't performing? Find out how to give it a jolt with Intel's performance analysis tools. (43:49 min)

Strategies for Tuning Multilevel Parallelism

Where should I start to add parallelism? How scalable is my application? What sort of speed-up can I expect? This webinar answers these questions and more. (57:41 min)

Performance Analysis Cookbooks

Discover the benefits of Intel VTune Amplifier analysis techniques with recipes that explore performance tuning use cases and environment-specific configurations.

False Sharing

Explore profiling a memory-bound linear-regression application using the General Exploration and Memory Access analyses.

I/O Issues: Remote Socket Accesses

Analyze an application that's based on the Data Plane Developer Kit (DPDK) for potential misconfiguration problems on a multisocket system using the General Exploration analysis.

OpenMP* Imbalance & Scheduling Overhead

Detect and fix frequent parallel bottlenecks of OpenMP* programs, such as imbalance on barriers and scheduling overhead.

Profiling a Java* Application in a Docker* Container

Configure a Docker* container for the Intel VTune Amplifier analysis to identify hotspots in a Java* application running in the isolated container

Profiling JavaScript* Code in Node.js*

Rebuild Node.js* and enable the performance analysis for JavaScript* code, including mixed-mode call stacks containing JavaScript and native frames.

Profile a Microsoft .NET Core Application

Use Intel VTune Amplifier for .NET core dynamic-code profiling to locate performance hotspots in the managed code and optimize the application turnaround.

More Cookbook Recipes

Code Samples

Learn how to use Intel VTune Amplifier with prewritten sample code projects.

Installation

To install and set up the Intel VTune Amplifier sample code:

  1. Copy the .zip file from the installation directory to a writable directory or share it on your system.
  2. Extract the sample from the archive.

Notes:

  • The samples are nondeterministic. Your screens may vary from the screenshots shown throughout these tutorials.
  • The samples are designed only to illustrate Intel VTune Amplifier features and do not represent best practices for tuning any particular code. Results may vary depending on the nature of the analysis and the code to which it is applied.

 

Samples

Name Description
tachyon_find_hotspots
  • Description: 2D ray tracer and renderer that shows how Intel VTune Amplifier analysis helps identify hotspots and performance bottlenecks in a C++ application
  • Performance issues addressed: Wrong algorithm choice and ineffective parallelization
  • Location: \samples\<locale>\C++\tachyon_vtune_amp_xe.zip
tachyon_analyze_locks
  • Description: Identifies locks preventing efficient parallelism in a C++ application
  • Performance issues addressed: Locks and waits
  • Location: \samples\<locale>\C++\tachyon_vtune_amp_xe.zip
matrix
  • Description: Calculates matrix transformations and identifies general hardware issues in a C++ application on the host system and on an Intel® Xeon Phi™ coprocessor
  • Performance issues addressed: Poor cycles per instruction (CPI) rate, cache misses, retire stalls, execution stalls, and others
  • Location: \samples\<locale>\C++\matrix_vtune_amp_xe.zip
nqueens_parallel
  • Description: Solves the n-queens problem for various board sizes and identifies general hardware issues in a Fortran application
  • Performance issues addressed: Thread contention and ineffective parallelization
  • Location: \samples\<locale>\Fortran\nqueens_parallel.zip
serial_nqueens_csharp & parallel_nqueens_csharp
  • Description: Computes the number of solutions to the n-queens problem for a given board size and identifies hotspots in a Microsoft Visual C#* application
  • Performance issues addressed: Demonstrates basic performance analysis
  • Location: \samples\<locale>\C#\serial_nqueens_csharp.zip
jitprofiling
  • Description: Example of instrumenting an application with the JIT profiling API (This API is typically useful for users with their own compilers or where code is generated dynamically.)
  • Performance issues addressed: Demonstrates profiling a JIT compiled application
  • Location: \samples\<locale>\C++\jitprofiling_vtune_amp_xe.zip