Intel® Advisor

Vectorization Optimization and Thread Prototyping

  • Vectorize and thread code or performance dies
  • Easy workflow + data + tips = faster code, faster
  • Prioritize, prototype, and predict performance gain
  • Take advantage of Priority Support. Connect privately with Intel engineers for technical questions.
Get Free Downloads & Trials

Vectorization and Threading Are Crucial to Performance

On modern processors, it is crucial to both vectorize—with  Intel® Advanced Vector Extensions (Intel® AVX) or single instruction, multiple data (SIMD) instructions—and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be up to 130 times faster than unthreaded or unvectorized code and much faster than code that is only threaded or only vectorized. That gap is growing with every new processor generation.


Benchmark Source: Intel Corporation. See Configurations. See notes and disclaimers below.1

Threaded plus vectorized can be much faster than either one alone. The gap is growing with each new hardware generation. For more information, see Details.

Advisor Survey

Intel Advisor gives you data to forecast the performance gain before you invest significant effort in implementation. Implement only the options that have a high return on investment.

Data-Driven Vectorization Optimization and Threading Design

You need good data to make good design decisions. What loops should be threaded and vectorized first? Is the performance gain worth the effort? Will the threading performance scale on larger core counts? Does this loop have a dependency that prevents vectorization? What are the trip counts and memory access patterns? Have you vectorized efficiently with the latest Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions? Or are you using older SIMD instructions?

Vectorization Optimization: Guidance to Speed up your Application

Quickly find what’s blocking vectorization in the locations that matter the most. Intel Advisor sorts your loops by potential gain, makes compiler reports easier to read by showing messages on your source, and gives you tips for effective vectorization. It provides key data like trip counts, data dependencies, and memory access patterns to let you vectorize safely and efficiently. 

Find effective optimization strategies using Intel Advisor’s cache-aware Roofline Analysis. It visualizes actual performance against hardware-imposed performance ceilings (rooflines), such as memory bandwidth and compute capacity. If the application does not work optimally on current memory and compute resources, roofline analysis identifies bottlenecks that limit performance and loops that will benefit the most from optimization.

Threading Design: Fast Prototypes Equal Better Performance, Fewer Bugs

Fast prototyping lets you explore several alternative threading designs and pick the best one before investing in an implementation. Intel Advisor has a simple workflow that gets you the data and tips you need to make design and optimization decisions faster. Add threading to C, C++, C#, and Fortran code. Quickly model and compare the performance scaling of different parallel designs without the cost and disruption of implementation. Delayed implementation means your code remains serial during the design phase, so you can release at any time without worrying about threading bugs. Find and eliminate data sharing issues during design―when they’re less expensive to fix. Model the performance impact of adding synchronization and project the scaling on systems with larger core counts.

Flow Graph Analyzer: Design, Validate and Model for Heterogeneous Systems

Flow Graph Analyzer (FGA) is a feature of Intel Advisor and is released as a technology preview in Intel® Parallel Studio XE. FGA provides a rapid visual prototyping environment for the Intel® Threading Building Blocks (Intel® TBB) flow graph API, which has built-in support for designing, validating, and modeling the design before generating Intel TBB source code. Using this tool, you can build algorithms for heterogeneous systems. FGA also enables you to collect traces from an Intel TBB flow graph application and analyze the application for performance issues.

FGA is not automatically installed with Intel® Parallel Studio XE (since it is a technology preview); however, it is available as a separate download from the registration center.

Getting Started with Flow Graph Analyzer

New for 2018

  • Find high impact, under-optimized loops with Cache-Aware Roofline Analysis.
  • Significantly simplify analysis of a large number of loops with Hierarchical Roofline based on aggregate floating point operations per second (FLOPS).
  • Get faster analysis results by limiting analysis to select modules.
  • Make better decisions with more data and recommendations.
  • Create custom metrics and reports with a script using the Python* API.
  • Prototype graph algorithms productively with the Flow Graph Analyzer.
  • More details for applications based on Intel® Math Kernel Library.
  • Support for latest version of Microsoft Visual Studio*.
  • Support for cross-operating system analysis to all license types. Collect data on Linux* (or any supported operating system) and analyze it in the user interface on Windows* or Linux. Just download what you need. Your license enables use on all supported operating systems.

Benefits of Priority Support

Paid licenses of Intel® Software Development Tools include Priority Support for one year from your date of purchase, with options to extend support at a reduced rate. Benefits include:

  • Direct and private interaction with Intel engineers. Submit confidential inquiries and code samples via the online service center.
  • Responsive help with your technical questions and other product needs.
  • Free access to all new product updates and access to older versions.
  • Learn from other experts via community product forums.
  • Access a vast library of self-help documents that build off decades of experience for creating high-performance code.

Specs at a Glance

Processors Intel® Xeon®, Intel® Xeon Phi™, and Intel® Core™ processor families,
32-bit and 64-bit processors that are compatible with Intel®
Languages C, C++, Fortran: Vectorization and threading
C#: Threading only
Compilers Works with compilers from Microsoft, GNU Complier Collection (GCC), Intel, and others that follow the same standards. Some features work better with the Intel compiler (for example, getting better vectorization advice).
Development Environments Integrated with Microsoft Visual Studio or runs as a stand-alone application
Operating Systems Windows*, Linux*
Additional Details See the documentation and release notes

 

1Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information, visit www.intel.com/benchmarks.

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804

 


有关编译器优化的更完整信息,请参阅优化通知.