Intel® Software Development Emulator

 

What If Home | Product Overview | FAQ | Primary Technology Contacts | Discussion Forum | Blog

 

 


Product Overview

This emulator is called Intel® Software Development Emulator or Intel® SDE, for short.

The current version is 7.21 and was released Apr 1, 2015. This version corresponds to the programmers reference 319433-022 available on the Intel Instruction Set Architecture Extensions page. The Intel SDE release notes are here.  This is a major release including:

  • Emulation support for the additional Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions present on some future Intel® Xeon® processors scheduled to be introduced after Knights Landing. 
  • Emulation support for the Intel® Secure Hash Algorithm (Intel® SHA) extensions present on the Intel Goldmont microarchtiecture. 
  • Emulation support for the Intel® Memory Protection Extensions (Intel® MPX)  present on the Intel Skylake microarchitecture and Intel Goldmont microarchitecture.
Intel SDE continues to support the features from previous releases:
  • Intel® SSE4, AES and PCLMULQDQ and the Intel® AVX
  • Intel® AVX2, RTM, BMI1 and BMI2 instructions being introduced on the Intel Haswell microarchitecture.
  • The ADOX/ADCX instructions being introduced on the Intel Broadwell microarchitecture.
  • Support for Hardware Lock Elision introduced on the Intel Haswell microarchitecture.
  • Support for Restricted Transactional Memory introduced on the Intel Haswell microarchitecture.

Related useful materials:

  • To allow customers to use Intel SDE with the the Intel AVX-512, Intel SHA and Intel MPX extensions, we are making available sources and binaries for the GNU gcc compiler and binutils below
  • More information about the Intel SHA extension is available here and includes a sample test application.
  • An article about using Intel MPX with Intel SDE.

Intel is releasing this Intel SDE so that developers can gain familiarity with our upcoming instruction set extensions. Intel SDE can help ensure software is ready to take advantage of the opportunities created by these new instructions in our processors. We hope that developers will explore the new instructions using the currently available compilers and assemblers. 
 

Intel SDE is built upon the Pin dynamic binary instrumentation system and the XED encoder decoder. Pin controls the execution of an application. Pin examines each static instruction in the application approximately once, as it builds traces for execution. During this process, which is called instrumentation, for each instruction encountered Pin asks Intel SDE if this instruction should be emulated or not. If the instruction is to be emulated, then Intel SDE tells Pin to skip over that instruction and instead branch to the appropriate emulation routine. It also tells Pin how to invoke that emulation function, what arguments to pass, etc.

Intel SDE queries CPUID to figure out what features to emulate. It also modifies the output of CPUID so that compiled applications that check for the emulated features are told that those features exist.

Intel SDE comes with several useful emulator-enabled Pin tools and the XED disassembler:

  • The basic emulator
  • The mix histogramming tool: This Pin tool can compute histograms by any of: dynamic instructions executed, instruction length, instruction category, and ISA extension grouping. This tool can also display the top N most frequently executed basic blocks and disassemble them.
  • The debugtrace ASCII tracing tool: This versatile tool is useful for observing the dynamic behavior of your code. It prints the instructions executed, and also the registers written, memory read and written, etc.
  • The footprint tool: This simple tool counts how many unique 16 byte chunks of data were referenced during the execution of the program.
  • The XED command line tool which can disassemble PECOFF or ELF binary executables.

 


Quick links:


Installation

Download and unpack the appropriate kit for your platform. Set your PATH variable to point to that directory. You can also refer to the tools in the kit using full or relative paths. Do not rearrange the files or subdirectories in the unpacked kit. If you want to move the kit directory, move everything.

WINDOWS*: If you are using Winzip, it puts the proper permissions on the unpacked files. However, if you are using Cygwin's tar command to unpack the Windows* kit, you must execute a "chmod -R +x path-to-kit" on the unpacked kit (where "path-to-kit" is the unpacked kit directory name).

LINUX*: On some distributions, you must disable "SE Linux" to allow Intel SDE to work. On Ubuntu systems, you must disable yama as described below.


Running Intel SDE

This is the pattern for running SDE:

path-to-kit/sde [sde args] -- user-application [app args]

The double dash ("--") is important. Options to SDE go before the double dash.  Square brackets denote optional arguments. 

The two most important options are the short and long help messages. To see the short help message:

path-to-kit/sde -help

And to see the very long help message:

path-to-kit/sde -long-help

In the help messages, the command line options are often displayed using underscores between words,  but dashes may be used instead of underscores. Often the Intel SDE help messages and this web page we'll refer to command line options as "knobs" for historical reasons. 

The short help message is as follows:

Intel(R) Software Development Emulator.  Version:  7.1.0 external
Copyright (C) 2008-2014, Intel Corporation. All rights reserved.

 Usage: sde [args] -- application [application-args]
 
 For the longer tool help, use "-help-long".
 
 If one of "-mix", "-debugtrace", or "-t toolname" are not,
 supplied, then just the underlying emulator will run.
 
     -mix                Run mix histogram tool
     -omix               Set the output file name for mix, Implies -mix
                         Default is "mix.out"
 
     -footprint          Run footprint tool
     -ofootprint         Set the output file name for footprint,
                         Implies -footprint. Default is "footprint.out"
 
     -debugtrace         Run mix debugtrace tool
     -odebugtrace        Set the output file name for debugtrace,
                         Implies -debugtrace
                         Default is "debugtrace.out"
 
     -ast                Run the Intel(R) AVX/SSE transition checker
     -oast               Set the output file name for the Intel AVX/SSE
                         transition checker. Implies -ast
                         Default is "avx-sse-transition.out"
 
     -quark              Set chip-check and CPUID for Intel(R) Quark
     -p4                 Set chip-check and CPUID for Pentium4
     -p4p                Set chip-check and CPUID for Pentium4 Prescott
     -mrm                Set chip-check and CPUID for MRM
     -pnr                Set chip-check and CPUID for PNR
     -nhm                Set chip-check and CPUID for NHM
     -wsm                Set chip-check and CPUID for WSM
     -snb                Set chip-check and CPUID for SNB
     -ivb                Set chip-check and CPUID for IVB
     -hsw                Set chip-check and CPUID for HSW
     -bdw                Set chip-check and CPUID for BDW
     -skx                Set chip-check and CPUID for SKX
     -skl                Set chip-check and CPUID for SKL
     -cnl                Set chip-check and CPUID for CNL
     -knl                Set chip-check and CPUID for KNL 
     -slt                Set chip-check and CPUID for Saltwell
     -slm                Set chip-check and CPUID for Silvermont
     -glm                Set chip-check and CPUID for Goldmont
 
     -skip-int3          Skip int3 instructions in the execution
 
     -help               Intel(R) SDE driver help
     -help-long          Emit the longer help message
     -phelp              Emit the pin JIT help message
     -version            Intel(R) SDE version number
 
     -debug              Enable debugging (pause to connect to debugger)
     -debug-attach       Enable debugging (attach mid-execution)
     -debug-port port    Specify the port number when debugging with gdb
 
     -sse-sde            Use an SSE-hosted version of SDE on 
                         AVX-enabled hosts
 
     -env VAR VALUE      Set environment variable VAR to VALUE.
                         The -env option is repeatable.
     -no-follow-child    Do not follow exec or subprocess creation
                         Default is to follow exec or subprocess creation
 
     -echo               Print the internal pin command line
     -p  pinarg          Add a pin argument specifically for the pin JIT
     -pt pintoolarg      Add a pin tool argument
     -t  toolname        Specify another tool name in the kit
 
     -pin-log-file fn    Specify a pin log file other than pin-log.txt
     -auto-p32           Support running apps compiled with -auto-p32
  
NOTE: All other arguments are passed to the emulator.

 


Emulate Everything Mode

  • WINDOWS*: A file called sde-win.bat is provided in Windows* that runs a cmd.exe window under the control of Intel SDE. You can make a shortcut to it and place that shortcut on your desktop. Everything run from that window will be run under the control of Intel SDE, so you may experience a slow down even when you are not emulating anything. All it really does is:
    path-to-kit/sde -- cmd.exe
  • OS X* or LINUX*: You can run your favorite shell under the control of Intel SDE:
    path-to-kit/sde -- /bin/tcsh

    And everything you run from there will be run under the control of Intel SDE.

 


Running the Histogram Tool

To generate the instruction mix histograms by opcode (XED iclass, the default) or instruction form (iform). As of version 4.29, the instruction length and instruction category histograms are always included.

path-to-kit/sde -mix -- user-application [args] 
path-to-kit/sde -mix -iform -- user-application [args] 

Notes:

  • The ISA extension histogram is also always computed and printed as star-prefixed rows in the histograms. ISA extensions are things like (BASE, X87, MMX, SSE, SSE2, SSE3, etc.). This is useful to see which instruction set extensions are used in your application.
  • The dynamic statistics are recorded and emitted several ways: (1) per-thread, (2) per function per thread, and (3) summed for the entire run. Instruction counts by function are also emitted if symbols are found for the application.
  • The output is written to a sde-mix-out.txt file in the current directory. The output file name can be changed using the -omix option:
    path-to-kit/sde -mix -omix foo.out -- user-application [args]
  • The top 20 basic blocks are always printed in the output with their execution weights.
  • "-top_blocks N" will allow you to change 20 to N that you specifiy.
  • Iforms: "Iform" is the XED term for variants of instructions. In a simple world they would be things like reg/reg or reg/mem, but things are more complicated in general. The iform names come from XED. Consider them experimental and subject to change. To see histograms by the more detailed iforms, use the "-iform" command line option.
  • There are many command line options for the mix tool:
Mix knobs

-d  [default 0]
        Only collect dynamic profile
-demangle  [default 1]
        Control for attempting symbol demangling
-disas  [default 0]
        Show disassembly for top blocks
-iform  [default 0]
        Compute ISA iform mix
-line_info  [default 0]
        Add line info to the top hot blocks
-mapaddr  [default 0]
        Map Addresses to File/Line information
-mix  [default 0]
        Compute mix histograms.
-mix_omit_per_function_stats  [default 0]
        Omit the per-function histograms. Reduces the output file size.
-mix_omit_per_thread_stats  [default 0]
        Omit per-thread stats for smaller output files
-no_shared_libs  [default 0]
        do not instrument shared libraries
-omix  [default sde-mix-out.txt]
        specify profile file name
-s  [default 0]
        terminate after collection of static profile for main image
-top_blocks  [default 20]
        specify a maximal number of top blocks for which icounts are printed

 

Example

Command:

    % sde -mix -- mm_cmp.opt.vec.exe

Output: in sde-mix-out.txt (default file name)

    #
    # $global-dynamic-counts
    #
    # opcode count
    #
    *isa-ext-BASE   147597
    *isa-ext-MODE64    222
    *isa-ext-SSE        21
    ADD               3092
    AND               2694
    CALL_NEAR         1739
    CDQE                 3
    CLD                 35
    CMOVB              800
    CMOVBE               6
    ...
    UCOMISS             14
    XCHG                 1
    XOR               4981
    ...
    *total          147840

 

Mix Accounting


The rows in the mix output histograms come in two flavors. The rows that begin with "*" are meta-categories which sum up the data in different ways. Here are descriptions of some of the meta categories:

*scalar-simd anything with the XED_ATTRIBUTE_SIMD_SCALAR including AVX and SSE operations. The instructions that operate on one vector element and whose iclass name ends with "SS" or "SD" have this attribute.
*sse-scalar any SSE instruction with the XED_ATTRIBUTE_SIMD_SCALAR
*sse-packed any SSE instruction without the XED_ATTRIBUTE_SIMD_SCALAR
*avx-scalar Any AVX instruction with the attribute XED_ATTRIBUTE_SIMD_SCALAR
*avx128 Any AVX instruction with a 128b vector length but without the XED_ATTRIBUTE_SIMD_SCALAR
*avx256 Any AVX instruction with a 256b vector length
*avx512 Any AVX instruction with a 512b vector length.
*mem-atomic Atomic memory operations
*stack-read Stack reads
*stack-write Stack writes
*iprel-read IP-relative memory reads
*iprel-write IP-relative memory writes
*mem-read-1 Memory read, 1 byte
*mem-read-2 Memory read, 2 bytes
*mem-read-4 Memory read, 4 bytes
*mem-read-8 Memory read, 8 bytes
*mem-write-1 Memory write, 1 byte
*mem-write-2 Memory write, 2 bytes
*mem-write-4 Memory write, 4 bytes
*mem-write-8 Memory write, 8 bytes
*isa-ext-BASE The "BASE" ISA-extension (generic group of instructions. Base includes much of the older instructions
*isa-ext-LONGMODE The set of instructions added with Intel64. These may be 32b or 64b instructions
*isa-set-I186 ISA "set" is a categorization of instructions in the BASE ISA-extension. I186 includes instructions introduced on the 80186 processor.
*isa-set-I386 ISA "set" is a categorization of instructions in the BASE ISA-extension. I386 includes instructions introduced on the 80386 processor.
*isa-set-I486REAL ISA "set" is a categorization of instructions in the BASE ISA-extension. I486REAL includes instructions introduced on the 80486 processor and valid in REAL mode.
*isa-set-I86 ISA "set" is a categorization of instructions in the BASE ISA-extension. I86 includes instructions introduced on the 8086 processor.
*isa-set-LONGMODE ISA "set" is a categorization of instructions in the LONGMODE ISA-extension. LONGMODE includes instructions introduced with Intel64 mode.
*isa-set-PENTIUMREAL ISA "set" is a categorization of instructions in the BASE ISA-extension. PENTIUMREAL includes instructions introduced with Pentium and valid in REAL mode.
*isa-set-PPRO ISA "set" is a categorization of instructions in the BASE ISA-extension. PPRO includes instructions introduced with the PentiumPro.
*lock_prefix Instructions with a 0xF0 LOCK prefix
*rep_prefix Instructions with a 0xF3 REP prefix
*repne_prefix Instructions with a 0xF2 REPNE prefix
*osz_prefix Instructions with a 0x66 prefix
*rex_prefix Instructions with a REX prefix (includes the following 4 cases). REX prefixes can be sued without any of the following 4 bits set as well.
*rexw_prefix Instructions with a REX prefix with the REX.W bit set
*rexr_prefix Instructions with a REX prefix with the REX.R bit set
*rexx_prefix Instructions with a REX prefix with the REX.X bit set
*rexb_prefix Instructions with a REX prefix with the REX.B bit set
*one-memops Instructions with one memory operation
*two-memops Instructions with two memory operations
*disp_only Instructions with a memory operation that addresses memory without using a base register or index register -- just a displacement.
*base_index Instructions with a memory operation that addresses meory using a base and index register, but without a displacement.
*base_index_disp Instructions with a memory operation that addresses memory using a base, index and displacement.
*scale_1 Number of instructions with a scale=1 for the index register
*scale_2 Number of instructions with a scale=2 for the index registern
*scale_4 Number of instructions with a scale=4 for the index register
*scale_8 Number of instructions with a scale=8 for the index register
*memdisp8 Memory operations with 8-bit displacements
*memdisp32 Memory operations with 32-bit displacements

 

 


Checking for bad pointers and data misalignment

 

Two of the more common errors when bringing up new code are (a) dereferencing bad pointers, either null pointers or pointing to inaccessible memory and (b) misaligned data accesses. Intel SDE has features to help identify these situations in programs. 

The options for the pointer checker are:

-null_check  [default 0]
        Check memops for null references.
-null_check_out  [default sde-null-check.out.txt]
        Output file name for -null-check.
-ptr_breakpoint  [default 0]
        Make the ptr checker raise application break point on errors.
-ptr_check  [default 0]
        Wild pointer checker. Checks memops for accessibility.
-ptr_check_out  [default sde-ptr-check.out.txt]
        Output file name for -ptr-check.
-ptr_check_warn  [default 0]
        Make the ptr checker warn on errors. Default is do die on errors.
-ptr_raise  [default 0]
        Make the ptr checker raise exception on errors. Default is to do
        PIN_SafeCopy on so that errors are ignored in analysis routines.

The alignment checker can give profiles of data alignment throught the program as well as when and where data accesses are misaligned.

-align_checker  [default normal]
        Check for unaligned memory accesses mixing. Valid choices are: assert,
        warn, report, normal, or ignore. There are also assert-all, warn-all
        and report-all which watch all instructions, including those that do
        not require alignment.
-align_checker_256b  [default 0]
        Limit checker to only checking for 256b (32B) memory references.
-align_checker_file  [default sde-align-checker-out.txt]
        File name for messages about unaligned memory accesses.
-align_checker_image  [default ]
        Only check instructions in the named image
-align_checker_prefetch  [default 1]
        1=check prefetches, 0=ignore prefetches.
-align_checker_stderr  [default 0]
        Attempt to write messages about unaligned data types to stderr.  If
        disabled, then the output file is used.
-align_correct  [default 1]
        1=Enabled, 0=Disable the alignment checker

 

 

Running the ASCII Tracing Tool

 

path-to-kit/sde -debugtrace -- user-application [args]

The output is written to a sde-debugtrace-out.txt file in the current directory by default. There are many options. Run 'sde -debugtrace -thelp' Pin tool option to see the choices. It prints the registers and flags modified by each instruction. It also prints the memory values read/written.

    % sde -debugtrace -- il_aesdec.opt.vec.exe
    % cat sde-debugtrace-out.txt
    ...
    TID0: Read 7b5b5465_73745665_63746f72_5d53475d = *(UINT128*)0x7fffffffd500
    TID0: INS 0x000000400b13     AVX      vmovdqa xmm0, xmmword ptr [rsp+0x100]
    TID0:   XMM0 := 7b5b5465_73745665_63746f72_5d53475d
    TID0:   XMM0 :=      1.62559e+286      1.23396e+171    (doubles)
    TID0:   XMM0 := 1.13882e+36 1.93584e+31 4.50904e+21 9.51515e+17    (floats)
    TID0: Read 48692853_68617929_5b477565_726f6e5d = *(UINT128*)0x7fffffffd510
    TID0: INS 0x000000400b1c     AVX      vmovdqa xmm1, xmmword ptr [rsp+0x110]
    TID0:   XMM1 := 48692853_68617929_5b477565_726f6e5d
    TID0:   XMM1 :=       6.84853e+40      5.20343e+131    (doubles)
    TID0:   XMM1 :=   238753 4.25907e+24 5.61426e+16 4.74242e+30    (floats)
    TID0: INS 0x000000400b25     AVX      vaesdec xmm0, xmm0, xmm1
    TID0:   XMM0 := 138ac342_faea2787_b58eb95e_b730392a
    TID0:   XMM0 :=      1.55269e-214      -1.02648e-50    (doubles)
    TID0:   XMM0 := 3.50286e-27 -6.079e+35 -1.06338e-06 -1.05037e-05    (floats)
    TID0: INS 0x000000400b2a     AVX      vmovdqa xmmword ptr [rsp+0x120], xmm0
    TID0: Write *(UINT128*)0x7fffffffd520 = 138ac342_faea2787_b58eb95e_b730392a
    TID0: Read 138ac342_faea2787_b58eb95e_b730392a = *(UINT128*)0x7fffffffd520
    TID0: INS 0x000000400b33     AVX      vmovdqu xmm0, xmmword ptr [rsp+0x120]
    TID0:   XMM0 := 138ac342_faea2787_b58eb95e_b730392a
    TID0:   XMM0 :=      1.55269e-214      -1.02648e-50    (doubles)
    TID0:   XMM0 := 3.50286e-27 -6.079e+35 -1.06338e-06 -1.05037e-05    (floats)
    TID0: INS 0x000000400b3c     BASE     lea rdi, ptr [r12+0x50bd40]          | rdi = 0x50bd70
    TID0: INS 0x000000400b44     BASE     mov edx, 0x10                        | rdx = 0x10
    TID0: INS 0x000000400b49     BASE     lea rsi, ptr [rsp+0xf0]              | rsi = 0x7fffffffd4f0
    TID0: INS 0x000000400b51     AVX      vmovdqu xmmword ptr [rsi], xmm0
    TID0: Write *(UINT128*)0x7fffffffd4f0 = 138ac342_faea2787_b58eb95e_b730392a
    TID0: INS 0x000000400b55     AVX      vzeroupper
    TID0: INS 0x000000400b58     BASE     call 0x400ec0                        | rsp = 0x7fffffffd3f8

 

Using the chip-check feature

Starting with version 2.94, Intel SDE includes a filtering mechanism to restrict executed instructions to a particular microprocessor. This is intended to be a helpful diagnostic tool for use when deploying new software. In the output of "sde -thelp" there is a section describing the controls for this feature:

-chip_check  [default ]
        Restrict to a specific XED chip.
-chip_check_call_stack  [default 0]
        Emit the call stack on error
-chip_check_call_stack_depth  [default 10]
        Specify chip-check call-stack max depth
-chip_check_die  [default 1]
        Die on errors. 0=warn, 1=die
-chip_check_disable  [default 0]
        Disable the chip checking mechanism.
-chip_check_emit_file  [default 0]
        Emit messages to a file. 0=no file, 1=file
-chip_check_exe_only  [default 0]
        Check only the main executable
-chip_check_file  [default sde-chip-check.txt]
        Output file chip-check errors.
-chip_check_jit  [default 0]
        Check during JIT'ing only. Checked code might not be executed due to
        speculative JIT'ing, but this mode is a little faster.
-chip_check_list  [default 0]
        List valid chip names and exit.
-chip_check_stderr  [default 1]
        Try to emit messages to stderr. 0=no stderr, 1=stderr
-chip_check_vsyscall  [default 0]
        Enable the chip checking checking in the vsyscall area.
-chip_check_zcnt  [default 0]
        The tzcnt/lzcnt has backward compatibility, check it explicitly anyway.

To list all the chips that Intel SDE knows about, you can use "sde -chip-check-list". To limit instructions to the processor codenamed Westmere, use "sde -chip-check WESTMERE -- yourapp". By default, Intel SDE emits warnings to a file called sde-chip-check.out and also to stderr (if the application has not closed stderr). This behavior can be customized using the above knobs.


 

Using Intel® Transactional Synchronization Extensions:

Intel TSX has two primary components: Restricted Transactional Memory (RTM) and Hardware Lock Elision (HLE). Both technologies are supported in Intel SDE as of version 6.1.

The RTM and HLE options are as follows. You can see this in the long help output emitted when running "sde -long-help".

Intel(R) TSX (RTM and HLE) Options

-hle_abort_log  [default 0]
	    Collect HLE abort log events
-hle_abort_log_file  [default sde-hle-abort-log.txt]
	    HLE abort log file name
-hle_elision_entries  [default 2]
	    Maximum number of elided locks
-hle_enabled  [default 0]
	    Enable HLE functionality
-rtm_abort_reason  [default 0]
	    Select RTM abort reason code. relevant only for abort mode
-rtm_extended_abort_code  [default 0]
	    report extended abort cause codes
-rtm_mode  [default disabled]
	    Select RTM mode from [disabled|abort|full|nop]
-tsx  [default 0]
	    Enable TSX (RTM and HLE) functionality
-tsx_cache_set_size  [default 8]
	    Number of cache lines in associative cache set (not necessarily power 	of 2)
-tsx_cache_sets_num  [default 64]
    	Number of cache sets in cache
-tsx_debug_log  [default 0]
	    define debug log details level, printed to the the log file
-tsx_file_name  [default sde-tsx-out.txt]
	    TSX log file name
-tsx_log_file  [default 0]
	    create a log file (does not necessarily fill it)
-tsx_log_flush  [default 0]
	    flush data to the log file after each write
-tsx_log_inst  [default 0]
	    print the instructions to the log file
-tsx_ot_accuracy  [default moderate]
	    ownership table accuracy level : simple, moderate
-tsx_ownership_size  [default 16]
	    log2(ownership table entries) default 16
-tsx_speculation_depth  [default 7]
	    Maximum speculation depth allowed
-tsx_stats  [default 0]
	    Collect TSX statistics
-tsx_stats_call_stack  [default 0]
	    Add call stack information to TSX statistics
-tsx_stats_call_stack_size  [default 10]
	    Call stack size in TSX statistics
-tsx_stats_file  [default sde-tsx-stats.txt]
	    TSX statistics file name
-tsx_stats_max_abort  [default 1000000]
	    Maximum number of TSX statistics abort list
-tsx_stats_top_most  [default 10]
	    Number of most common aborts TSX statistics to display

(As in all SDE knobs, you can use dashes instead of underscores.)

For RTM, there are 4 different modes for rtm, disabled, abort, full and nop. Disabled is the default because emulating RTM in software is very slow. The abort modes always aborts upon executing xbegin. Full is the RTM enabled mode. And NOP treats the RTM instructions like NOPs.  Please see this blog posting for more information on how to use RTM.

Intel SDE provides statistics about the use of RTM and HLE during the execution of your program:

#LIST OF RTM COUNTERS DATA PER THREAD
#-----------------------------------------------------------------------------
# TID      XBEGIN        XEND      XABORT      ABORTS
    0          10           0           3          10
    1           0           0           0           0
    2           0           0           0           0
  TOTAL        10           0           3          10


# LIST OF HLE COUNTERS DATA PER THREAD
#--------------------------------------------------------------------------------------------------------------------------
# TID  XACQUIRE  SPECULATIVE  NON SPECULATIVE  GENERAL ABORTS  CONTENTION ABORTS  ELISION CACHE FULL  HLE IN RTM
    0        11            4                7               7                  0                  20           0
  TOTAL      11            4                7               7                  0                  20           0
 

# COUNTERS OF TSX ABORTS PER ABORT REASON
#-----------------------------------------------------------------------------
# REASON                         RTM ABORTS   HLE ABORTS
  ABORT_CONTENTION                        0           22
  UNFRIENDLY_INST                         0            2
  ABORT_SYNC_EXCEPTION                    2            0
  ABORT_CONTENTION                        2            0
  ABORT_XA_EXECUTED                       3            0
  CACHE_SET_FULL                          1            0
  UNFRIENDLY_INST                         1            0
  NESTING_TOO_DEEP                        1            0
 
 
# TOP 10 GENERAL ABORTS
#---------------------------------------------------------------------------
#               IP    COUNT    INSTRUCTION DISASSEMBLY
0x0000000000e61048     4127    pause
0x0000000077351778     1483    syscall
 
 
# TOP 10 CONTENTION ABORTS
#---------------------------------------------------------------------------
#               IP    COUNT    INSTRUCTION DISASSEMBLY
0x0000000000400be8       14    xchg dword ptr [rdx], eax
0x0000000000400b62        3    xrelease lock xchg dword ptr [rdx], eax
0x0000000000400a4e        2    xacquire lock xchg dword ptr [rdx], eax
0x0000000000400f5a        2    mov eax, dword ptr [rip+0x200c24]
0x0000000000400f95        1    mov dword ptr [rip+0x200be9], eax

# TOP 10 HLE BY LOCK AND IP ABORTS
#---------------------------------------------------------------------------
#             LOCK    COUNT                 IP    INSTRUCTION DISASSEMBLY
0x000000000120d5a0        5 0x000000000120101c    xacquire lock xchg dword ptr [rax], edx
0x000000000120d5a0        1 0x0000000001201140    xrelease lock xchg dword ptr [rax], edx
             TOTAL        6
0x000000000120d2a0        1 0x000000000120113d    mov edx, dword ptr [rbp]
             TOTAL        1

 
#LIST OF TSX GENERAL ABORT EVENTS
#---------------------------------------------------------------------------------------------------
# TID                 IP       DATA ADDRESS    TSX TYPE                         REASON 
    0 0x0000000000400a99 0x0000000000000000         RTM           ABORT_SYNC_EXCEPTION
    0 0x0000000000400aab 0x0000000000000000         RTM           ABORT_SYNC_EXCEPTION
    0 0x0000000000400ab6 0x0000000000000000         RTM                UNFRIENDLY_INST
    0                N/A                N/A         RTM              ABORT_XA_EXECUTED
    0                N/A                N/A         RTM              ABORT_XA_EXECUTED
    0 0x0000000000400b3a 0x0000000000609f40         RTM                 CACHE_SET_FULL
    0                N/A                N/A         RTM               NESTING_TOO_DEEP
    0                N/A                N/A         RTM              ABORT_XA_EXECUTED

 
# TOP 10 HLE BY LOCK AND IP ABORTS
#---------------------------------------------------------------------------
#             LOCK    COUNT                 IP    INSTRUCTION DISASSEMBLY
0x000000000120d5a0        5 0x000000000120101c    xacquire lock xchg dword ptr [rax], edx
0x000000000120d5a0        1 0x0000000001201140    xrelease lock xchg dword ptr [rax], edx
             TOTAL        6
0x000000000120d2a0        1 0x000000000120113d    mov edx, dword ptr [rbp]
             TOTAL        1

 
 
#LIST OF TSX CONTENTION ABORT EVENTS
#--------------------------------------------------------------------------------------------------------
# TID     TRANSACTION IP   KILLER TID          KILLER IP   KILLER DATA ADDRESS  INSIDE TSX  TSX TYPE
    4 0x0000000000400a4e            3 0x0000000000400f5a    0x0000000000601b84         YES       HLE
    3 0x0000000000400a4e            0 0x0000000000400be8    0x00000000022f0010          NO       HLE
    2 0x0000000000400a4e            0 0x0000000000400be8    0x00000000022f0010          NO       HLE
    3 0x0000000000400a4e            2 0x0000000000400f5a    0x0000000000601b84         YES       HLE
    2 0x0000000000400a4e            4 0x0000000000400a4e    0x00000000022f0010          NO       HLE
    4 0x0000000000400a4e            2 0x0000000000400a4e    0x00000000022f0010          NO       HLE
    4 0x0000000000400a4e            2 0x0000000000400f95    0x0000000000601b84         YES       HLE

When transaction abort, Intel SDE provides codes describing the causes of those aborts:

ABORT_SYNC_EXCEPTION - Failed to read\write from memory.
ABORT_ASYNC_EXCEPTION - Got general exception like Interrupt or signal.
ABORT_CONTENTION - Two threads colided.
ABORT_XA_EXECUTED - XABORT instruction sent.
CACHE_SET_FULL - Cache is full.
UNFRIENDLY_INST - Unfriendly instruction for RTM.
NESTING_TOO_DEEP - We reached RTM nesting limit, default 8.
ABORT_LOCK_SPLIT_CACHE - Atomic instruction with split cache lines.
ELIDED_LOCK_UPGRADE - Lock acquisition failed due to attempt to write an elided lock.
ELIDING_WRITE_LOCK - Tried to put an elided lock on a line which is already modified speculatively.
PARTIAL_ELISION_ACCESS - Found intersecting elided lock.
ELIDING_AGAIN - Tried to elide the same lock twice.
UNLOCK_DOESNT_RESTORE - Tried to release non matching elided lock.
ELISION_CACHE_NOT_EMPTY - Elision cahce is not empty before commit.
RTM_ABORT_HLE_RTM_MIX - Tried to start RTM transaction inside an HLE transaction.

Here are some simple examples of running with Intel TSX:

Running sde with full RTM support:
   % path-to-kit/sde -rtm-mode full -- application

Running sde with HLE support:
   % path-to-kit/sde -hle-enabled -- application


Running sde with both RTM and HLE support:
   % path-to-kit/sde -tsx -- application


Running sde with TSX support and statistics:
   % path-to-kit/sde -tsx -tsx-stats -- application


Running sde with TSX support and statistics and call stack information:
   % path-to-kit/sde -tsx -tsx-stats -tsx-stats-call-stack -- application

And here is a more complete example of running an RTM on Windows*

volatile int winning_thread = -1;  volatile int aborts = 0;  volatile int num_ends = 0;

unsigned __stdcall thread_worker(void * arg)
{
    int id =(int) arg;
    unsigned int status = _xbegin();
    if (status == _XBEGIN_STARTED) {
        for (int i=0; i<10000000; i++) winning_thread = id;
        num_ends++;
        _xend();
    }
    else {
        aborts++;
    }
    return 0;
}

int main() {
    HANDLE threads[10];
    for (int i=0; i<10; i++)
        threads[i] = (HANDLE) _beginthreadex(NULL, 0, &thread_worker, (void *)i, 0, NULL);
    for (int i=0; i<10; i++)
        WaitForSingleObject( threads[i], INFINITE );
    return  0;
}

Compiling with the Intel Compiler: 

% icl  /c /Foapplication.obj application.cpp
% xilink.exe /MACHINE:X64    /LARGEADDRESSAWARE:NO /OUT:application.exe application.obj
% path-to-kit/sde -tsx -- application.exe

 


Running XED Disassembler

Example:

    path-to-kit/xed -i foo.exe > dis.txt

The above command writes dis.txt.
See the help message (-help) for many options.

XED prints the ISA Extension for every instruction. This is useful for finding new instructions in your code. Example Output:

    % xed -i il_aesdec.opt.vec.exe > dis
    % cat dis
    SYM main:
    XDIS 400a40: PUSH      BASE       55                       push rbp
    XDIS 400a41: DATAXFER  BASE       4889E5                   mov rbp, rsp
    XDIS 400a44: LOGICAL   BASE       4883E480                 and rsp, 0xffffffffffffff80
    XDIS 400a48: PUSH      BASE       4154                     push r12
    XDIS 400a4a: PUSH      BASE       4155                     push r13
    XDIS 400a4c: BINARY    BASE       4881EC70010000           sub rsp, 0x170
    XDIS 400a53: DATAXFER  BASE       BEFE9F9D00               mov esi, 0x9d9ffe
    XDIS 400a58: DATAXFER  BASE       BF03000000               mov edi, 0x3
    XDIS 400a5d: CALL      BASE       E83E050000               call 0x400fa0 <__intel_new_feature_proc_init>
    XDIS 400a62: AVX       AVX        C5F8AE9C24F0000000       vstmxcsr dword ptr [rsp+0xf0]
    XDIS 400a6b: LOGICAL   BASE       33C0                     xor eax, eax
    XDIS 400a6d: LOGICAL   BASE       818C24F000000040800000   or dword ptr [rsp+0xf0], 0x8040
    XDIS 400a78: AVX       AVX        C5F8AE9424F0000000       vldmxcsr dword ptr [rsp+0xf0]
    XDIS 400a81: DATAXFER  AVX        C5FA6F15E7120000         vmovdqu xmm2, xmmword ptr [rip+0x12e7]
    XDIS 400a89: DATAXFER  AVX        C5FA6F0DEF120000         vmovdqu xmm1, xmmword ptr [rip+0x12ef]
    XDIS 400a91: DATAXFER  AVX        C5FA6F05F7120000         vmovdqu xmm0, xmmword ptr [rip+0x12f7]
    XDIS 400a99: MISC      BASE       8D1400                   lea edx, ptr [rax+rax*1]
    XDIS 400a9c: BINARY    BASE       FFC0                     inc eax
    XDIS 400a9e: SHIFT     BASE       48C1E204                 shl rdx, 0x4
    XDIS 400aa2: DATAXFER  AVX        C5FA7F9240395000         vmovdqu xmmword ptr [rdx+0x503940], xmm2
    XDIS 400aaa: DATAXFER  AVX        C5FA7F8A407B5000         vmovdqu xmmword ptr [rdx+0x507b40], xmm1
    XDIS 400ab2: DATAXFER  AVX        C5FA7F8240BD5000         vmovdqu xmmword ptr [rdx+0x50bd40], xmm0
    XDIS 400aba: DATAXFER  AVX        C5FA7F9250395000         vmovdqu xmmword ptr [rdx+0x503950], xmm2
    XDIS 400ac2: DATAXFER  AVX        C5FA7F8A507B5000         vmovdqu xmmword ptr [rdx+0x507b50], xmm1
    XDIS 400aca: DATAXFER  AVX        C5FA7F8250BD5000         vmovdqu xmmword ptr [rdx+0x50bd50], xmm0
    XDIS 400ad2: BINARY    BASE       3D00020000               cmp eax, 0x200
    XDIS 400ad7: COND_BR   BASE       72C0                     jb 0x400a99 <main+0x59>
    ...
 

Debugging Emulated Code 

Intel SDE provides support for debugging application with emulated code. A description on using the system debugger is available here.


 

Intel AVX/SSE transition checking

It is recommended that a VZEROALL or a VZEROUPPER be inserted between code that uses Intel® SSE and code that uses 256b Intel AVX instructions. Intel SDE can check for Intel SSE instructions followed by Intel AVX instructions without an intervening zeroing instruction, and vice versa.

  • Use the "-ast" Pin tool knob

  • Use the "-oast filename" to specify a filename other than "avx-sse-transition.out" . When using the "sde" driver -oast implies -ast.

    path-to-kit/sde -oast filename.out -- user-application [args]

Example:

Command:

    % sde -ast -- mm_256_cmpouunord_ps.opt.vec.exe

Output: in "avx-sse-transition.out"

   Dynamic Dynamic
    AVX to SSE SSE to AVX Static Dynamic
    BlockPC Transition Transitio Icount Executions Icount
    ================ ============ ============ ======== ========== ========
    # TID 0
    400993 1 0 16 1 16
    4009f2 6 6 4 6 24
    4009da 7 7 4 7 28
    # SUMMARY
    # AVX_to_SSE_transition_instances: 14
    # SSE_to_AVX_transition_instances: 13
    # Dynamic_insts: 147841
    # AVX_to_SSE_instances/instruction: 0.0001
    # SSE_to_AVX_instances/instruction: 0.0001
    # AVX_to_SSE_instances/100instructions: 0.0095
    # SSE_to_AVX_instances/100instructions: 0.0088

​In this case, the program counter locations implicated the isnan() calls in the following code:

    source1 = _mm256_loadu_ps(s1);
    source2 = _mm256_loadu_ps(s2);
    dest=_mm256_cmpunord_ps(source1,source2);
    _mm256_storeu_ps((float*) d, dest);
    for (i = 0; i < 8; i++) {
        if (isnan(s1[i]) || isnan(s2[i])) {
            e[i] = -1;
        }
        else {
            e[i] = 0;
        }
    }

Code Sample: AES-128 Encryption and Decryption Routines

This sample code provides a set of C routines that demonstrate encryption and decryption routines using AES-128 in ECB mode. Read the End User License Agreement before you download the code samples.


 

GCC for Intel AVX-512, Intel MPX and Intel SHA extensions

In the attachments section lower down on this page we have source code and binaries for GCC and binutils that support Intel AVX-512, Intel MPX and Intel SHA instruction set extensions.

 

Frequently Asked Questions

Q: How do I download Intel SDE?
A: Intel SDE is available on the Download Page.

Q: How do I ask questions and get support for Intel SDE?
A: The Intel® Software Development Emulator Forum has been set up to address questions. Intel engineers will be monitoring and available to answer user questions.
Q: How do I ask questions and get support for Intel® AVX and CPU instructions?
A: The Intel® AVX and CPU Instructions Forum has been set up to address questions. Intel engineers will be monitoring and available to answer user questions.

Q: What are the system requirements?
A: Intel SDE will run on IA-32 or Intel® 64 processors running Windows* or Linux* or OS X* operating systems.  OS X* is first supported in version 5.31 and only on Snow Leopard (10.6) and Lion (10.7). 

Q: What are the CPUID requirements?
A: Pin, and thus Intel SDE requires a Pentium 4 or later processor.

Q: How can I see symbols on windows?
A: Yes, for version 4.29 onwards.  For earlier versions, you need to have two copies of dbghelp.dll version 6.11.1.404 on your system in the right places for Pin to find it. You can get it from Microsoft here: http://www.microsoft.com/whdc/devtools/debugging/default.mspx . Pin, upon which Intel SDE is built, requires that the IA-32 and Intel64 architecture versions of dbghelp.dll version 6.11.1.404 be placed in the kit directory in the "ia32" and "intel64" subdirectories.

Q: What about running IA-32 architecture applications on Intel® 64 processor platforms?
A: This is supported.

Q: What about precise SSE exception handling in MXCSR?
A: Do not rely on Intel SDE to correctly set the MXCSR exception flags.

Q: What happens when my program dereferences inaccessible memory for emulated instructions?
A: Intel SDE will crash. You can use "sde -trace-execute -- your app" to get dump of the instructions executed in your task to find out what instruction was last executed. Then you can use "sde -debugtrace - your app" to look for the last write to the registers involved in the effective address computation for that last executed instruction. We are working on a better solution.

Q: Does Intel SDE handle cygwin /cygdrive/c paths or symlinks?
A: No, because Pin does not handle them, Intel SDE cannot handle them.

Q: Where can I learn more about Pin?
A: http://pintool.org and the yahoo group "pinheads". The release notes for Pin are available here. It provides additional information and restrictions about using Intel SDE, which is built on Pin.

Q: Where can I learn more about mix and debugtrace?
A: The basic sources for debugtrace and mix are available in Pin kits on the Pin website. I've modified them slightly to invoke Intel SDE and to print out the new YMM registers.

Q: What are the licensing terms?
A: See the download page.

Q: I see you are shipping "GPL with runtime exception" runtime libraries. Where can I find more information?
A: In order to comply with the license requirements of the Linux* runtime libraries used by Intel SDE, we provide the following sources:

For version 3.09 and prior: Local download link for the unmodified gcc 4.2.0.

For version 3.88 ... 4.x: Local download link for unmodified gcc 4.5.1, libelf-0.8.5, and libdwarf-20070525.

For version 5.x: Local download link for unmodified gcc-4.6.3, libelf-0.8.13, libdwarf-20111214

For version 6.1: and later: Local download link for unmodified gcc-4.7.2 libelf-0.8.13, libdwarf-20111214.

 

Licenses for runtime libraries for Intel SDE on Linux*

Q: Is there a version of GCC available that supports AES and PCLMULQDQ?
A:Yes, GCC 4.4 includes support for these instructions. Snapshots are available here: ftp://gcc.gnu.org/pub/gcc/snapshots/

Q: Is there a version of GCC available that supports Intel AVX?

A: GCC 4.5 supports the initial Intel AVX instructions present on Intel® microarchitecture code name Sandy Bridge.

GCC 4.7.x supports the Intel AVX2, BMI1/2 and RTM instructions.

Q: Is there a version of GCC available that supports Intel AVX-512, Intel SHA extensions and Intel MPX?

A: Downloads of gcc supporting these features are available in the attacments section of this page.

Q: Is there a version of the Intel® Compiler available that supports Intel® AVX?
A: Yes, the current Intel Compiler supports the Intel AVX instructions. This version also includes support for Intel® SSE4, AES and PCLMULQDQ instructions. To use the post-32nm new instructions for the processor codenamed Ivy Bridge, it is required that you use Intel® Parallel Composer 2011 Update 2 or Intel® Composer XE 2011 Update 2. The compiler version is 12.0.2.x.

 

Q: How can I get Intel SDE to work on Ubuntu 10 or later?

A: There is a known problem of using Intel SDE on Linux* systems that prevents the use of ptrace attach via the sysctl /proc/sys/kernel/yama/ptrace_scope. In this case Pin is not able to use its default (parent) injection mode. To resolve this, execute the following echo command as root. (SDE does not need to run as root.)

$ echo 0 > /proc/sys/kernel/yama/ptrace_scope

 


Primary Technology Contact

Ady Tal
Ady is a senior software engineer in Intel's Software and Services Group. Ady joined Intel in 1996. Ady works on emulation of new instructions in support of the compiler, architecture and the enabling teams.  

 

Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.