Learn how to perform advanced tuning for specific microarchitectures.
This Python* tuning demonstration uses covariance implementations built into NumPy and the Intel® Data Analytics Acceleration Library. It includes code snippets.
Get instructions for collecting performance data for message passing interface (MPI) and hybrid MPI plus thread codes in a Linux* environment. It provides flexibility for profiling all ranks or just a subset.
With Intel® Performance Libraries, learn which conditions make it possible to build serial and parallel applications that deliver repeatable results. (61:11 min)
Find out how Intel® VTune™ Profiler monitors high performance computing (HPC) workloads and generates reports to optimize various platform components. (56 min)
See how to tune and accelerate compute-intensive performance with the features and architecture of Intel® Xeon® Scalable processors and Intel® Software Development Tools.
Locate performance and scalability issues, and identify whether imbalance, lock contention, creation overhead, or scheduling overhead cause them.
Identify where code performance needs to improve and learn how to fix it.
Overcome small matrix multiplication challenges and enable optimizations for matrix-matrix multiplication using Intel® Math Kernel Library.
Use key components in Intel® System Studio to correct hot spots, power inefficiencies, memory leaks, non-optimized threads, and other system issues.
See a demonstration of the Application Performance Snapshot. Quickly discover untapped performance and make the best use of your computer hardware. (20:27 min)
This video discusses the needs, advantages, and common tools and techniques used to profile Python applications. It includes a demo and code sample. (47:28 min)
In multisocket non-uniform memory access (NUMA) systems, get the best performance through memory object placement on the memory subsystem. (58:39 min)
Find the root causes of why a hybrid application may not be performing as expected using this walk-through of uncovering and resolving each issue. (43:49 min)
Explore profiling a memory-bound linear-regression application using the General Exploration and Memory Access analyses.
Analyze an application that's based on the Data Plane Developer Kit (DPDK) for potential misconfiguration problems on a multisocket system using the General Exploration analysis.
Detect and fix frequent parallel bottlenecks of OpenMP* programs, such as imbalance on barriers and scheduling overhead.
Configure a Docker* container for the Intel VTune Profiler analysis to identify hot spots in a Java* application running in the isolated container environment.
Use Intel VTune Profiler for .NET core dynamic-code profiling to locate performance hot spots in the managed code and optimize the application turnaround.