Processor Tuning Guides

Learn how to perform advanced tuning for specific microarchitectures.

Profile Python* with Intel® VTune™ Profiler

This Python* tuning demonstration uses covariance implementations built into NumPy and the Intel® Data Analytics Acceleration Library. It includes code snippets.

Use Intel® Advisor & Intel VTune Profiler with Message Passing Interface (MPI)

Get instructions for collecting performance data for message passing interface (MPI) and hybrid MPI plus thread codes in a Linux* environment. It provides flexibility for profiling all ranks or just a subset.

Webinars

With Intel® Performance Libraries, learn which conditions make it possible to build serial and parallel applications that deliver repeatable results. (61:11 min)

Find out how Intel® VTune™ Profiler monitors high performance computing (HPC) workloads and generates reports to optimize various platform components. (56 min)

See how to tune and accelerate compute-intensive performance with the features and architecture of Intel® Xeon® Scalable processors and Intel® Software Development Tools.

Locate performance and scalability issues, and identify whether imbalance, lock contention, creation overhead, or scheduling overhead cause them.

Identify where code performance needs to improve and learn how to fix it.

Overcome small matrix multiplication challenges and enable optimizations for matrix-matrix multiplication using Intel® Math Kernel Library.

Use key components in Intel® System Studio to correct hot spots, power inefficiencies, memory leaks, non-optimized threads, and other system issues.

Application Performance Snapshot

See a demonstration of the Application Performance Snapshot. Quickly discover untapped performance and make the best use of your computer hardware. (20:27 min)

Analysis of Python Applications

This video discusses the needs, advantages, and common tools and techniques used to profile Python applications. It includes a demo and code sample. (47:28 min)

How N U M A Affects Workloads

In multisocket non-uniform memory access (NUMA) systems, get the best performance through memory object placement on the memory subsystem. (58:39 min)

Performance Tune-Up for Your Hybrid Program

Find the root causes of why a hybrid application may not be performing as expected using this walk-through of uncovering and resolving each issue. (43:49 min)

Strategies for Tuning Multilevel Parallelism

Learn where to add parallelism to your application and determine how scalable it can be. (57:41 min)

Performance Analysis Cookbooks

Discover the benefits of analysis techniques using Intel VTune Profiler. Use recipes that explore performance tuning use cases and environment-specific configurations.

False Sharing

Explore profiling a memory-bound linear-regression application using the General Exploration and Memory Access analyses.

I/O Issues: Remote Socket Accesses

Analyze an application that's based on the Data Plane Developer Kit (DPDK) for potential misconfiguration problems on a multisocket system using the General Exploration analysis.

OpenMP* Imbalance & Scheduling Overhead

Detect and fix frequent parallel bottlenecks of OpenMP* programs, such as imbalance on barriers and scheduling overhead.

Profile a Java* Application in a Docker* Container

Configure a Docker* container for the Intel VTune Profiler analysis to identify hot spots in a Java* application running in the isolated container environment.

Profile JavaScript* Code in Node.js*

Rebuild Node.js* and enable the performance analysis for JavaScript* code, including mixed-mode call stacks containing JavaScript and native frames.

Profile a Microsoft .NET Core Application

Use Intel VTune Profiler for .NET core dynamic-code profiling to locate performance hot spots in the managed code and optimize the application turnaround.

More Cookbook Recipes

Training Samples

Learn how to use Intel VTune Profiler with prewritten sample code projects.

Installation

To install and set up the Intel VTune Profiler sample code:

  1. Copy the .zip file from the installation directory to a writable directory or share it on your system.
  2. Extract the sample from the archive.

Notes

  • The samples are nondeterministic. Your screens may vary from the screenshots shown throughout these tutorials.
  • The samples are designed only to illustrate Intel VTune Profiler features and do not represent best practices for tuning any particular code. Results may vary depending on the nature of the analysis and the code to which it is applied.

 

Samples

Name Description
matrix
  • Description: Calculates matrix transformations and identifies general hardware issues in a C++ application
  • Performance issues addressed: Poor cycles per instruction (CPI) rate, cache misses, retire stalls, execution stalls, and others
  • Location: \samples\<locale>\C++\matrix