Intel® Architecture Code Analyzer

What If Home | Product Overview | Features and Benefits | Throughput Analysis |
Technical Requirements | Discussion Forum | Blog

Product Overview

Intel® Architecture Code Analyzer helps you statically analyze the data dependency, throughput and latency of code snippets on Intel® microarchitectures. The term kernel is used throughout the rest of this document instead of code snippet.

Features and Benefits

For a given binary, Intel® Architecture Code Analyzer:

  • Performs static analysis of kernel throughput and latency under ideal front-end, out-of-order engine and memory hierarchy conditions.
  • Identifies the binding of the kernel instructions to the processor ports.
  • Identifies kernel critical path.

The Intel® Architecture Code Analyzer enables you to do a first order estimate of relative kernel performance on different micro architectures. The Intel® Architecture Code Analyzer does not provide absolute performance numbers.

Intel® Architecture Code Analyzer is a command-line tool with ASCII output. It handles one or more kernels that are marked for analysis within an executable, a shared library, or an object file.

Throughput Analysis

The Throughput Analysis treats the kernel as a body of an infinite loop. It computes the kernel throughput and highlights its bottlenecks.

The Throughput Analysis report contains the following whole kernel information:

  • Throughput of the analyzed kernel, counted in cycles.
    • The kernel bottleneck: front-end, port #, divider unit or inter-iteration dependency.
    • Total number of cycles each processor port was bound with micro-ops.

The Throughput Analysis also provides the following information per instruction:

  • Number of instruction micro-ops.
  • Average number of cycles the instruction was bound to each processor port, per loop iteration
  • An indication whether the instruction is on the critical path of the analyzed kernel.
  • Instruction disassembly in Intel® Software Developer’s Manual (MASM) style.

Technical Requirements

Technical Requirements" change the sentence "Intel® Architecture Code Analyzer is a command-line utility that can analyze a kernel, contained in a binary file, that is delimited with special markers. The tool is capable of analyzing Intel® 64 code, including Intel® AVX, AVX2 and AVX-512 instructions.

Intel® Architecture Code Analyzer is available on Windows*, Linux*, and Mac OS X* operating systems. Only Intel® 64 operating systems are supported.

Release Notes for 2.3

  • Added support for Intel® micro architecture code name Skylake (client and server).
  • Added support for Intel® Advanced Vector Extensions 512 (Intel® AVX-512).
  • Added support for tracing the execution (see user guide).
  • Dropped the -no_interiteration flag.

Release Notes for 2.2

  • Added support for Intel® microarchitecture code name Broadwell.
  • Better support for Intel® Advanced Vector Extensions (Intel® AVX) Gather operations.
  • Replaced the "InterIteration" throughput bottleneck indication with a more general "long dependency chains" indication.
  • Added an indication when front end bubbles occur (see user guide).
  • Numerous improvements in modelling supported processors.
  • Unsupported instructions are now marked with 'X' instead of '!' for better readability.
  • NHM, WSM microarchitectures are not actively supported any more.
  • Removed support for running IACA on 32 bit operating systems and for analyzing 32 bit programs.
  • Dropped latency analysis support.
  • Windows* OS will be supported in the next release. Available Now!

Release Notes for 2.1

  • Added support for Intel® microarchitecture codenamed Haswell.
  • Added support for MSVS64 compiler.
  • Added 64-bit binaries.

Release Notes for 2.0.1

  • Fixed a bug where –graph option failed to produce graph file.

Release Notes for 2.0

  • Added support for Intel® microarchitecture codenamed Sandy Bridge. This replaces the Intel® AVX microarchitecture previously in Intel® Architecture code Analyzer.
  • Added support for Intel® microarchitecture codenamed Ivy Bridge.
  • Added support for Mac OS X.
  • Improved analyzer algorithm for throughput analysis
    (new analysis output, see more details in User Manual)
  • Improved analyzer algorithm for latency analysis, output also includes microarchitecture events that will affect the latency. (new analysis output, see more details in the User Manual)
  • Added support for graphic output of the dependency graph

Release Notes for 1.1.3

  • Fixed a bug where using -o option produced truncated output
  • Fixed IACA_UD_BYTES definition in iacaMarks.h to include {}.

Release Notes for 1.1.2

  • Intel® Architecture Code Analyzer now supports adding START and END marks in code compiled with Visual C++ compiler (64-bit). See iacaMarks.h
  • Intel® Architecture Code Analyzer now supports multiple block analysis. You can direct the tool to analyze the n'th block that is delimited with analyzer marks. When used with n=0, all surrounded blocks in the file are analyzed and the output contains separate reports per block.

Release Notes for 1.1.1

  • Fixed Intel® AVX zero idiom instructions wrong identification
  • Fixed empty code blocks (containing only zero idiom instructions / not supported instructions) crashing the analyzer
  • Fixed Analyzer arch nehalem option to treat AES and PCLMUL instructions as illegal. These aren't supported on Intel® microarchitecture codename Nehalem.
  • Changed analyzer marks to abort if the binary is executed. To deactivate the marks when building for execution #define IACA_MARKS_OFF or use -DIACA_MARKS_OFF option in the compiler command line. Binaries with active marks should be used for analysis only.

Release Notes for 1.1

  • Intel® Architecture Code Analyzer is now hosted on Linux* operating systems, in addition to Windows* operating systems. Both IA-32 and Intel® 64 operating systems are supported.
  • Intel® Architecture Code Analyzer now supports two existing Intel® processors: Intel® microarchitecture, codenamed Nehalem and Westmere
  • Two critical path types are detected:
    • DATA_DEPENDENCY critical path (similar to previous releases - reflects instruction data dependencies only)
    • PERFORMANCE critical path (new - reflects port conflicts and front-end pressure, as well)

Release Notes for 1.0.2

  • Ignoring pop ebx / push ebx that Intel® Architecture Code Analyzer Markers add to IA32 code
  • Fixed misclassifying rcp / rsqrt as divider operations

Release Notes for 1.0.1

  • Graceful handling of unsupported instructions, they are quietly ignored in the analyzed block analysis and do not impact the throughput and latency calculations.
  • A few unsupported instructions are now supported, e.g. CMOV instruction family
  • Intel® AVX to Intel® SSE code switch detection. The performance penalty associated with such code switch is noted but not accounted for.
For more complete information about compiler optimizations, see our Optimization Notice.

26 comments

Top
Gideon S. (Intel)'s picture

IACA Version 2.1 does not work well with VS 2015. This will be fixed  with the support for Windows OS in Version 2.2, expected end Q1 or early Q2 ’17.  

Thanks, /g

Alexander L.'s picture

Unfortunately this tool (v. 2.1 x64) does not work with 64-bit Dll compiled with Visual Studio 2015 Update 3 for me too :(

COULD NOT FIND START_MARKER NUMBER 1

Could you, please, fix this?

Is it possible, that the reason is, the assembly contain managed code as bridge between .Net and native methods?

Todd W.'s picture

I'm seeing IACA 2.1 fail on both debug and retail builds from Visual Studio 2015 Update 3.  It'd be nice to have this fixed as IACA's of no use to us without current VS support; consider testing with the VS 2017 previews too.

iaca.exe -64 "helloWorld.dll"

COULD NOT FIND START_MARKER NUMBER 1

Also, IACA_MARKS_OFF blanks IACA_MSC64_START/END rather than IACA_VC64_START/END.

Travis D.'s picture

IACA seems to think the latency of of a mov from memory is 5 on recent architectures, but it is actually 4 when you don't use complex addressing. See this forum post for details:

https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/700017

Can it be fixed in the next version?

Israel Hirsh (Intel)'s picture

we are resuming support for Intel(R) Architecture Code Analyzer with BDW and SKL support probably before end of 2016. a few bug fixes, incl. missed zero idioms may be expected. no ETA for VTune integration, possibly will never happen. 

Matthias H. (Intel)'s picture

will there be an update for later hardware? (BDW, SKL, ...)
 

Asaf Hargil (Intel)'s picture

Hi,

"Version 2.1 is not identifying MOVZX r32/64,r8 as a zero-latency instruction"

This is bug. 

We suspended iaca support for the near future so unfortunately I don’t expect a new version soon.

Pierre Laurent (Intel)'s picture

Is there a plan to integrate a (possibly simplified) version of IACA in Intel VTune, where all is needed would be to click on the start location, on the end location, and IACA would analyse the code between the 2 locations, and highlight the conflicts in the critical path ?

Kenny S.'s picture

Version 2.1 is not identifying MOVZX r32/64,r8 as a zero-latency instruction:

Intel(R) Architecture Code Analyzer Version - 2.1
Analyzed File - a
Binary Format - 64Bit
Architecture  - HSW
Analysis Type - Throughput

Throughput Analysis Report
--------------------------
Block Throughput: 1.75 Cycles       Throughput Bottleneck: FrontEnd

Port Binding In Cycles Per Iteration:
---------------------------------------------------------------------------------------
|  Port  |  0   -  DV  |  1   |  2   -  D   |  3   -  D   |  4   |  5   |  6   |  7   |
---------------------------------------------------------------------------------------
| Cycles | 1.2    0.0  | 1.2  | 0.0    0.0  | 0.0    0.0  | 0.0  | 1.2  | 1.2  | 0.0  |
---------------------------------------------------------------------------------------

N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0)
D - Data fetch pipe (on ports 2 and 3), CP - on a critical path
F - Macro Fusion with the previous instruction occurred
* - instruction micro-ops not bound to a port
^ - Micro Fusion happened
# - ESP Tracking sync uop was issued
@ - SSE instruction followed an AVX256 instruction, dozens of cycles penalty is expected
! - instruction not supported, was not accounted in Analysis

| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   0*   |           |     |           |           |     |     |     |     |    | mov rax, rcx
|   0*   |           |     |           |           |     |     |     |     |    | xor rcx, rcx
|   1    | 0.2       | 0.2 |           |           |     | 0.2 | 0.2 |     |    | movzx rax, bl
|   1    | 0.2       | 0.2 |           |           |     | 0.2 | 0.2 |     |    | movzx rax, cl
|   1    | 0.2       | 0.2 |           |           |     | 0.2 | 0.2 |     |    | movzx rax, dl
|   1    | 0.2       | 0.2 |           |           |     | 0.2 | 0.2 |     |    | movzx rax, sil
|   1    | 0.2       | 0.2 |           |           |     | 0.2 | 0.2 |     |    | movzx rax, dil
Total Num Of Uops: 5

 

Bill T.'s picture

Unable to run 2.1 version on CentOS 6 -- looks like it needs a newer version of glibc(?)

Where are older versions available for download?

Thanks!

 

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.