Design Code for Efficient Vectorization, Threading, Memory Usage, and Accelerator Offloading
Efficiently Offload Your Code to GPUs
Use Offload Advisor to understand if your code would benefit from GPU porting.
- Identify offload opportunities where it pays off the most.
- Quantify the potential performance speedup from GPU offloading.
- Locate bottlenecks and identify potential performance gains of fixing each.
- Estimate data-transfer costs and get guidance on how to optimize data transfer.
Optimize for Memory and Compute
Automated Roofline Analysis provides an intuitive visual representation of application performance against hardware-imposed limitations, such as memory bandwidth and compute capacity.
With automated roofline analysis, you can:
- See performance headroom against hardware limitations
- Get insights into an effective optimization roadmap
- Identify high-impact optimization opportunities
- Detect and prioritize bottlenecks by performance gain and understand their likely causes, such as memory bound versus compute bound
- Pinpoint exact memory bottlenecks (L1, L2, L3, or DRAM)
- Visualize optimization progress
Optimize Vectorization for Better Performance
Vectorization is the operation of Single Instruction Multiple Data (SIMD) instructions on multiple data objects in parallel within a single CPU core. This can greatly increase performance by reducing loop overhead and making better use of the multiple math units in each core.
- Find loops that will benefit from better vectorization.
- Identify where it is safe to force compiler vectorization.
- Pinpoint memory-access issues that may cause slowdowns.
- Get actionable user code-centric guidance to improve vectorization efficiency.
Model, Tune, and Test Multiple Threading Designs
Threading Advisor helps you quickly prototype multiple threading options, project scaling on larger systems, optimize faster, and implement with confidence.
- Identify issues and fix them before implementing parallelism.
- Add threading to C, C++, C#, and Fortran code.
- Prototype the performance impact of different threaded designs and project scaling on systems with larger core counts to identify potential scaling without disrupting development or implementation.
- Find and eliminate data-sharing issues during design (when they're less expensive to fix.)
Create, Visualize, and Analyze Task and Dependency Computation Graphs
Flow Graph Analyzer (FGA) is a rapid visual prototyping environment for applications that can be expressed as flow graphs using Intel® Threading Building Blocks (Intel® TBB) applications.
- Construct, validate, and model application design and performance before generating Intel TBB code.
- Get insight into nested or top-level data-parallel algorithm efficiency.
In addition to Intel TBB, FGA helps you to:
- Visualize and interact with DPC++ asynchronous task graphs
- Get insights into DPC++ task scheduling inefficiencies
- Speed up algorithm design and express data-parallel constructs efficiently
- Visualize and analyze OpenMP task dependence graphs for performance bottlenecks
Try It Out
Follow the Get Started Guide and use an introductory code sample to see how Intel Advisor works.
Learn High-Performance Code Design
Browse the cookbooks for recipes to help you design and optimize high-performing code for modern computer architectures.