Intel® Software Development Emulator

What If Home | Product Overview | FAQ | Primary Technology Contacts | Discussion Forum | Blog


Product Overview

This emulator is called Intel® Software Development Emulator or Intel® SDE, for short.

The current version is 5.38 and was released January 4, 2013. This version corresponds to the programmers reference 319433-014 available on the Intel® AVX page. The Intel SDE release notes are here.  This is a major release  including support for the OS X* operating system, and the Restricted Transactional Memory (RTM) introduced on the processor codenamed Haswell. This version includes support for:

The first version of Intel SDE was released in August 2008. Older versions are also available on the download page.

Intel is releasing this Intel SDE so that developers can gain familiarity with our upcoming instruction set extensions. Intel SDE can help ensure software is ready to take advantage of the opportunities created by these new instructions in our processors. We hope that developers will explore the new instructions using the currently available compilers and assemblers. 

Intel SDE is built upon the Pin dynamic binary instrumentation system and the XED encoder decoder. Pin controls the execution of an application. Pin examines each static instruction in the application approximately once, as it builds traces for execution. During this process, which is called instrumentation, for each instruction encountered Pin asks Intel SDE if this instruction should be emulated or not. If the instruction is to be emulated, then Intel SDE tells Pin to skip over that instruction and instead branch to the appropriate emulation routine. It also tells Pin how to invoke that emulation function, what arguments to pass, etc.

Intel SDE queries CPUID to figure out what features to emulate. It also modifies the output of CPUID so that compiled applications that check for the emulated features are told that those features exist.

Intel SDE comes with several useful emulator-enabled Pin tools and the XED disassembler:

  • The basic emulator
  • The mix histogramming tool: This Pin tool can compute histograms by any of: dynamic instructions executed, instruction length, instruction category, and ISA extension grouping. This tool can also display the top N most frequently executed basic blocks and disassemble them.
  • The debugtrace ASCII tracing tool: This versatile tool is useful for observing the dynamic behavior of your code. It prints the instructions executed, and also the registers written, memory read and written, etc.
  • The footprint tool: This simple tool counts how many unique 16 byte chunks of data were referenced during the execution of the program.
  • The XED command line tool which can disassemble PECOFF or ELF binary executables.

 


Quick links:

 


Installation

Download and unpack the appropriate kit for your platform. Set your PATH variable to point to that directory. You can also refer to the tools in the kit using full or relative paths. Do not rearrange the files or subdirectories in the unpacked kit. If you want to move the kit directory, move everything.

WINDOWS*: If you are using Winzip, it puts the proper permissions on the unpacked files. However, if you are using Cygwin's tar command to unpack the Windows* kit, you must execute a "chmod -R +x path-to-kit" on the unpacked kit (where "path-to-kit" is the unpacked kit directory name).

OS X*: If you are using OS X*, you must also execute two commands as root to enable Intel SDE to work:

% sudo chgrp procmod path-to-kit/i*/pinbin
% sudo chmod g+s path-to-kit/i*/pinbin

where "path-to-kit" is the name of the unpacked Intel SDE kit.

LINUX*: On some distributions, you must disable "SE Linux" to allow Intel SDE to work. On Ubuntu systems, you must disable yama as described below.


Running Intel SDE

This is the pattern for running SDE:

path-to-kit/sde [sde args] -- user-application [app args]

The double dash ("--") is important. Options to SDE go before the double dash.  Square brackets denote optional arguments. 

The two most important options are the short and long help messages. To see the short help message:

path-to-kit/sde -help

And to see the very long help message: ("t" stands for "tool", as in "pin tool")

path-to-kit/sde -thelp

The knobs listed use underscores, but dashes can be used instead of underscores.

The short help message is as follows:

%  sde-bdw-external-5.31.0-2012-11-01-mac/sde -help
Intel (R) Software Development Emulator.  Version:  5.31.0 bdw-external
Copyright (C) 2008-2012, Intel Corporation. All rights reserved.

 Usage: sde [args] -- application [application-args]
 
 For the longer tool help, use "-thelp".
 
 If one of "-mix", "-debugtrace", or "-t toolname" are not,
 supplied, then just the underlying emulator will run.
 
     -mix                Run mix histogram tool
     -omix               Set the output file name for mix, Implies -mix
                         Default is "mix.out"
 
     -footprint          Run footprint tool
     -ofootprint         Set the output file name for footprint,
                         Implies -footprint. Default is "footprint.out"
 
     -debugtrace         Run mix debugtrace tool
     -odebugtrace        Set the output file name for debugtrace,
                         Implies -debugtrace
                         Default is "debugtrace.out"
 
     -ast                Run the Intel(R) AVX/SSE transition checker
     -oast               Set the output file name for the Intel AVX/SSE
                         transition checker. Implies -ast
                         Default is "avx-sse-transition.out"
 
     -mrm                Set chip-check and CPUID for MRM
     -pnr                Set chip-check and CPUID for PNR
     -nhm                Set chip-check and CPUID for NHM
     -wsm                Set chip-check and CPUID for WSM
     -snb                Set chip-check and CPUID for SNB
     -ivb                Set chip-check and CPUID for IVB
     -hsw                Set chip-check and CPUID for HSW
     -bdw                Set chip-check and CPUID for BDW
 
     -slt                Set chip-check and CPUID for Silverthorne
 
     -skip-int3          Skip int3 instructions in the execution
 
     -help               Intel(R) SDE driver help
     -thelp              Emit the emulator tool's help message
     -phelp              Emit the pin JIT help message
     -version            Intel(R) SDE version number
 
     -debug              Enable debugging (pause to connect to debugger)
     -debug-attach       Enable debugging (attach mid-execution)
 
     -sse-sde            Use an SSE-hosted version of SDE on 
                         AVX-enabled hosts
 
     -env VAR VALUE      Set environment variable VAR to VALUE.
                         The -env option is repeatable.
     -no-follow-child    Do not follow exec or subprocess creation
                         Default is to follow exec or subprocess creation
 
     -echo               Print the internal pin command line
     -p  pinarg          Add a pin argument specifically for the pin JIT
     -pt pintoolarg      Add a pin tool argument
     -t  toolname        Specify another tool name in the kit
 
     -pin-log-file fn    Specify a pin log file other than pin-log.txt
  
NOTE: All other arguments are passed to the emulator.

 


Emulate Everything Mode

  • WINDOWS*: A file called sde-win.bat is provided in Windows* that runs a cmd.exe window under the control of Intel SDE. You can make a shortcut to it and place that shortcut on your desktop. Everything run from that window will be run under the control of Intel SDE, so you may experience a slow down even when you are not emulating anything. All it really does is:
    path-to-kit/sde -- cmd.exe
  • OS X* or LINUX*: You can run your favorite shell under the control of Intel SDE:
    path-to-kit/sde -- /bin/tcsh

    And everything you run from there will be run under the control of Intel SDE.

 


Running the Histogram Tool

To generate the instruction mix histograms by opcode (XED iclass, the default) or instruction form (iform). As of version 4.29, the instruction length and instruction category histograms are always included.

path-to-kit/sde -mix -- user-application [args] 
path-to-kit/sde -mix -iform -- user-application [args] 

Notes:

  • The ISA extension histogram is also always computed and printed as star-prefixed rows in the histograms. ISA extensions are things like (BASE, X87, MMX, SSE, SSE2, SSE3, etc.). This is useful to see which instruction set extensions are used in your application.
  • The dynamic statistics are recorded and emitted several ways: (1) per-thread, (2) per function per thread, and (3) summed for the entire run. Instruction counts by function are also emitted if symbols are found for the application.
  • The output is written to a sde-mix-out.txt file in the current directory. The output file name can be changed using the -omix option:
    path-to-kit/sde -mix -omix foo.out -- user-application [args]
  • The top 20 basic blocks are always printed in the output with their execution weights.
  • "-top_blocks N" will allow you to change 20 to N that you specifiy.
  • Iforms: "Iform" is the XED term for variants of instructions. In a simple world they would be things like reg/reg or reg/mem, but things are more complicated in general. The iform names come from XED. Consider them experimental and subject to change. To see histograms by the more detailed iforms, use the "-iform" knob
  • There are many options for the mix tool:
Mix knobs

-d  [default 0]
        Only collect dynamic profile
-demangle  [default 1]
        Control for attempting symbol demangling
-disas  [default 0]
        Show disassembly for top blocks
-iform  [default 0]
        Compute ISA iform mix
-line_info  [default 0]
        Add line info to the top hot blocks
-mapaddr  [default 0]
        Map Addresses to File/Line information
-mix  [default 0]
        Compute mix histograms.
-mix_omit_per_function_stats  [default 0]
        Omit the per-function histograms. Reduces the output file size.
-mix_omit_per_thread_stats  [default 0]
        Omit per-thread stats for smaller output files
-no_shared_libs  [default 0]
        do not instrument shared libraries
-omix  [default sde-mix-out.txt]
        specify profile file name
-s  [default 0]
        terminate after collection of static profile for main image
-top_blocks  [default 20]
        specify a maximal number of top blocks for which icounts are printed

 

Example

Command:

sde -mix -- mm_cmp.opt.vec.exe

Output: in sde-mix-out.txt (default file name)

#
# $global-dynamic-counts
#
# opcode count
#
*isa-ext-BASE   147597
*isa-ext-MODE64    222
*isa-ext-SSE        21
ADD               3092
AND               2694
CALL_NEAR         1739
CDQE                 3
CLD                 35
CMOVB              800
CMOVBE               6
...
UCOMISS             14
XCHG                 1
XOR               4981
...
*total          147840

 

Mix Accounting


The rows in the mix output histograms come in two flavors. The rows that begin with "*" are meta-categories which sum up the data in different ways. Here are descriptions of some of the meta categories:

*scalar-simd anything with the XED_ATTRIBUTE_SIMD_SCALAR including AVX and SSE operations. The instructions that operate on one vector element and whose iclass name ends with "SS" or "SD" have this attribute.
*sse-scalar any SSE instruction with the XED_ATTRIBUTE_SIMD_SCALAR
*sse-packed any SSE instruction without the XED_ATTRIBUTE_SIMD_SCALAR
*avx-scalar Any AVX instruction with the attribute XED_ATTRIBUTE_SIMD_SCALAR
*avx128 Any AVX instruction with a 128b vector length but without the XED_ATTRIBUTE_SIMD_SCALAR
*avx256 Any AVX instruction with a 256b vector length.
*mem-atomic Atomic memory operations
*stack-read Stack reads
*stack-write Stack writes
*iprel-read IP-relative memory reads
*iprel-write IP-relative memory writes
*mem-read-1 Memory read, 1 byte
*mem-read-2 Memory read, 2 bytes
*mem-read-4 Memory read, 4 bytes
*mem-read-8 Memory read, 8 bytes
*mem-write-1 Memory write, 1 byte
*mem-write-2 Memory write, 2 bytes
*mem-write-4 Memory write, 4 bytes
*mem-write-8 Memory write, 8 bytes
*isa-ext-BASE The "BASE" ISA-extension (generic group of instructions. Base includes much of the older instructions
*isa-ext-LONGMODE The set of instructions added with Intel64. These may be 32b or 64b instructions
*isa-set-I186 ISA "set" is a categorization of instructions in the BASE ISA-extension. I186 includes instructions introduced on the 80186 processor.
*isa-set-I386 ISA "set" is a categorization of instructions in the BASE ISA-extension. I386 includes instructions introduced on the 80386 processor.
*isa-set-I486REAL ISA "set" is a categorization of instructions in the BASE ISA-extension. I486REAL includes instructions introduced on the 80486 processor and valid in REAL mode.
*isa-set-I86 ISA "set" is a categorization of instructions in the BASE ISA-extension. I86 includes instructions introduced on the 8086 processor.
*isa-set-LONGMODE ISA "set" is a categorization of instructions in the LONGMODE ISA-extension. LONGMODE includes instructions introduced with Intel64 mode.
*isa-set-PENTIUMREAL ISA "set" is a categorization of instructions in the BASE ISA-extension. PENTIUMREAL includes instructions introduced with Pentium and valid in REAL mode.
*isa-set-PPRO ISA "set" is a categorization of instructions in the BASE ISA-extension. PPRO includes instructions introduced with the PentiumPro.
*lock_prefix Instructions with a 0xF0 LOCK prefix
*rep_prefix Instructions with a 0xF3 REP prefix
*repne_prefix Instructions with a 0xF2 REPNE prefix
*osz_prefix Instructions with a 0x66 prefix
*rex_prefix Instructions with a REX prefix (includes the following 4 cases). REX prefixes can be sued without any of the following 4 bits set as well.
*rexw_prefix Instructions with a REX prefix with the REX.W bit set
*rexr_prefix Instructions with a REX prefix with the REX.R bit set
*rexx_prefix Instructions with a REX prefix with the REX.X bit set
*rexb_prefix Instructions with a REX prefix with the REX.B bit set
*one-memops Instructions with one memory operation
*two-memops Instructions with two memory operations
*disp_only Instructions with a memory operation that addresses memory without using a base register or index register -- just a displacement.
*base_index Instructions with a memory operation that addresses meory using a base and index register, but without a displacement.
*base_index_disp Instructions with a memory operation that addresses memory using a base, index and displacement.
*scale_1 Number of instructions with a scale=1 for the index register
*scale_2 Number of instructions with a scale=2 for the index registern
*scale_4 Number of instructions with a scale=4 for the index register
*scale_8 Number of instructions with a scale=8 for the index register
*memdisp8 Memory operations with 8-bit displacements
*memdisp32 Memory operations with 32-bit displacements

 

 


Checking for bad pointers and data misalignment

 

Two of the more common errors when bringing up new code are (a) dereferencing bad pointers, either null pointers or pointing to inaccessible memory and (b) misaligned data accesses. Intel SDE has features to help identify these situations in programs. 

The options for the pointer checker are:

-null_check  [default 0]
        Check memops for null references.
-ptr_check  [default 0]
        Wild pointer checker. Checks memops for accessibility.
-ptr_check_warn  [default 0]
        Make the ptr checker warn on errors. Default is do die on errors.
-ptr_raise  [default 0]
        Make the ptr checker raise exception on errors. Default is to do
        PIN_SafeCopy on so that errors are ignored in analysis routines.

The alignment checker can give profiles of data alignment throught the program as well as when and where data accesses are misaligned.

-align_checker  [default normal]
        Check for unaligned memory accesses mixing. Valid choices are: assert,
        warn, report, normal, or ignore. There are also assert-all, warn-all
        and report-all which watch all instructions, including those that do
        not require alignment.
-align_checker_256b  [default 0]
        Limit checker to only checking for 256b (32B) memory references.
-align_checker_file  [default sde-align-checker-out.txt]
        File name for messages about unaligned memory accesses.
-align_checker_image  [default ]
        Only check instructions in the named image
-align_checker_prefetch  [default 1]
        1=check prefetches, 0=ignore prefetches.
-align_checker_start_address 
        Address and count to trigger a start 
-align_checker_stderr  [default 0]
        Attempt to write messages about unaligned data types to stderr.  If
        disabled, then the output file is used.
-align_checker_stop_address 
        Address and count to trigger a stop 
-align_correct  [default 1]
        1=Enabled, 0=Disable the alignment checker

Running the ASCII Tracing Tool

 

path-to-kit/sde -debugtrace -- user-application [args]

The output is written to a debugtrace.out file in the current directory by default. There are many options. Run 'sde -debugtrace -thelp' Pin tool option to see the choices. It prints the registers and flags modified by each instruction. It also prints the memory values read/written.

% sde -debugtrace -- il_aesdec.opt.vec.exe
% cat debugtrace.out
...
0x0000000000400a1a movdqa xmmword ptr [rsp+0x110], xmm1
Write *(UINT128)0x7fbffff380 = 48692853_68617929_5b477565_726f6e5d
Read 7b5b5465_73745665_63746f72_5d53475d = *(UINT128)0x7fbffff370
0x0000000000400a23 movdqa xmm0, xmmword ptr [rsp+0x100]
Read 48692853_68617929_5b477565_726f6e5d = *(UINT128)0x7fbffff380
0x0000000000400a2c movdqa xmm1, xmmword ptr [rsp+0x110]
0x0000000000400a35 aesdec xmm0, xmm1
0x0000000000400a3a movdqa xmmword ptr [rsp+0x120], xmm0
Write *(UINT128)0x7fbffff390 = 138ac342_faea2787_b58eb95e_b730392a
Read 138ac342_faea2787_b58eb95e_b730392a = *(UINT128)0x7fbffff390
0x0000000000400a43 movdqa xmm0, xmmword ptr [rsp+0x120]
0x0000000000400a4c movdqa xmmword ptr [rsp+0xc0], xmm0
Write *(UINT128)0x7fbffff330 = 138ac342_faea2787_b58eb95e_b730392a
0x0000000000400a55 lea rdi, ptr [r12+0x50cac0] | rdi = 0x50cad0
0x0000000000400a5d lea rsi, ptr [rsp+0xc0] | rsi = 0x7fbffff330
0x0000000000400a65 mov edx, 0x10 | rdx = 0x10
0x0000000000400a6a call 0x400e50 | rsp = 0x7fbffff268
Write *(UINT64*)0x7fbffff268 = 0x400a6f
0x0000000000400e50 test rdx, rdx | rflags = 0x202

 


Using the chip-check feature

Starting with version 2.94, Intel SDE includes a filtering mechanism to restrict executed instructions to a particular microprocessor. This is intended to be a helpful diagnostic tool for use when deploying new software. In the output of "sde -thelp" there is a section describing the controls for this feature:

-chip_check  [default ]
        Restrict to a specific XED chip.
-chip_check_die  [default 1]
        Die on errors. 0=warn, 1=die
-chip_check_disable  [default 0]
        Disable the chip checking mechanism.
-chip_check_emit_file  [default 0]
        Emit messages to a file. 0=no file, 1=file
-chip_check_file  [default sde-chip-check.txt]
        Output file chip-check errors.
-chip_check_jit  [default 0]
        Check during JIT'ing only. Checked code might not be executed due to
        speculative JIT'ing, but this mode is a little faster.
-chip_check_list  [default 0]
        List valid chip names and exit.
-chip_check_stderr  [default 1]
        Try to emit messages to stderr. 0=no stderr, 1=stderr
-chip_check_vsyscall  [default 0]
        Enable the chip checking checking in the vsyscall area.

To list all the chips that Intel SDE knows about, you can use "sde -chip-check-list". The Intel® microarchitecture code name Sandy Bridge chip ("SANDYBRIDGE") includes all the Intel® AVX instructions and the "FUTURE" chip includes the FMA instructions as well. To limit instructions to the processor codenamed Westmere, use "sde -chip-check WESTMERE -- yourapp". By default, Intel SDE emits warnings to a file called sde-chip-check.out and also to stderr (if the application has not closed stderr). This behavior can be customized using the above knobs.


Using Restricted Transactional Memory

The RTM options are as follows. You can see this in the long help output emitted when running "sde -thelp".

-rtm_cache_set_size  [default 8]
        Number of cache lines in associative cache set(not necessarily power
        of 2)
-rtm_cache_sets_num  [default 64]
        Number of cache sets in cache
-rtm_debug_log  [default 0]
        define debug log details level, printed to the the log file
-rtm_log_file  [default 0]
        create a log file (does not necessarily fill it)
-rtm_log_flush  [default 0]
        flush data to the log file after each write
-rtm_log_inst  [default 0]
        print the instructions to the log file
-rtm_mode  [default disabled]
        Select rtm mode from [disabled|abort|full|nop]
-rtm_ownership_size  [default 16]
        log2(ownership table entries) default 16
-rtm_speculation_depth  [default 7]
        Maximum speculation depth allowed

(As in all SDE knobs, you can use dashes instead of underscores.)

There are 4 different modes for rtm, disabled, abort, full and nop. Disabled is the default because emulating RTM in software is very slow. The abort modes always aborts upon executing xbegin. Full is the RTM enabled mode. And NOP treats the RTM instructions like NOPs.  Please see this blog posting for more information on how to use RTM.

 


Running XED Disassembler

Example:

path-to-kit/xed -i foo.exe > dis.txt

The above command writes dis.txt.
See the help message (-help) for many options.

XED prints the ISA Extension for every instruction. This is useful for finding new instructions in your code. Example Output:

% xed -i il_aesdec.opt.vec.exe > dis
% cat dis
SYM main:
XDIS 40097c: PUSH BASE 4154 push r12
XDIS 40097e: PUSH BASE 53 push rbx
XDIS 40097f: BINARY BASE 4881EC38010000 sub rsp, 0x138
XDIS 400986: DATAXFER BASE BF03000000 mov edi, 0x3
XDIS 40098b: CALL BASE E8CA150000 call 0x401f5a
XDIS 400990: DATAXFER SSE2 660F6F15581F0000 movdqa xmm2, xmmword ptr [rip+0x1f58]
XDIS 400998: DATAXFER SSE2 660F6F0D601F0000 movdqa xmm1, xmmword ptr [rip+0x1f60]
XDIS 4009a0: DATAXFER SSE2 660F6F05681F0000 movdqa xmm0, xmmword ptr [rip+0x1f68]
XDIS 4009a8: SSE SSE 0FAE9C24C0000000 stmxcsr dword ptr [rsp+0xc0]
XDIS 4009b0: LOGICAL BASE 818C24C000000040800000 or dword ptr [rsp+0xc0], 0x8040
XDIS 4009bb: SSE SSE 0FAE9424C0000000 ldmxcsr dword ptr [rsp+0xc0]
XDIS 4009c3: LOGICAL BASE 33C0 xor eax, eax
XDIS 4009c5: DATAXFER SSE2 660F7F90C0465000 movdqa xmmword ptr [rax+0x5046c0], xmm2
XDIS 4009cd: DATAXFER SSE2 660F7F88C0885000 movdqa xmmword ptr [rax+0x5088c0], xmm1
XDIS 4009d5: DATAXFER SSE2 660F7F80C0CA5000 movdqa xmmword ptr [rax+0x50cac0], xmm0
XDIS 4009dd: BINARY BASE 4883C010 add rax, 0x10
XDIS 4009e1: BINARY BASE 483D00400000 cmp rax, 0x4000
...


Debugging Emulated Code on Windows*

Starting with Intel SDE version 5.38, we support debugging with emulated instructions and registers when using Microsoft Visual Studio 2012*. You can compile your application with earlier versions of Visual Studio if required. But to use the debugging features with SDE, you must use Microsoft Visual Studio 2012.  Installing the MSI on the Intel SDE download page adds the required support for debugging.

The Intel SDE extensions to MSVS give you the following new features:

  • The ability to launch a debug session under SDE.
  • An extension to the standard MSVS disassembly window that knows about the new instructions. You can set breakpoints and single-step over new instructions.
  • A new emulated register window that displays the new register values that are emulated by SDE. The values can be displayed in various formats.
  • A new “SDE console” window which can be used to pass additional commands to SDE for debugging.

Prerequisites:

  • Windows 7 (or windows server 2008)
  • Visual Studio 2012 – You can use any of the editions (Ultimate, Professional, etc.), except the Express edition. Also, make sure you have an RTM release.
  • Installing the latest SDE kit (5.38 or later)

Instructions:First, install the prerequisites listed above.  Then, do the following to enable debugging with SDE: 

  • If you do not already have an MSVS project file, you must create one.  I typically create an empty project and specify the executable that I want to debug in the “Application Command” property.  Also set the “Application Command Arguments”, if you have any. 
  • Under Debug->Options and Settings...->Intel® SDE Debugger, set the “SDE Kit Directory” property to point to the root of your (unpacked) SDE kit.  There is a <Browse> option that allows you to choose the directory with the file browser.  Click OK to save your changes.
  • Open the project properties. At the top of the project properties there is a drop-down labeled “Debugger to launch”.  Change this drop-down to “SDE Debugger”. 
  • If you have a pre-built executable, you can replace the Application Command field with the path to your executable.
  • Click OK to save your changes.
  • Before starting the debugger, you probably want to set some breakpoints in your application.  Do this by opening a source file in MSVS, navigate to a line, and press F9.  When you are ready, start debugging either by pressing F5 or by clicking on
  • Once a debugger session is started, you can view the emulated registers and see the emulated instructions.  Open the Emulated Registers window by choosing “DEBUG->Windows->Emulated Registers” in the IDE.  You can change the output format for individual registser by right clicking.  Open the disassembly window by choosing “DEBUG->Windows->Disassembly”.
  • By right clicking on a register, you can change its display format to various signed/unsigned integer widths, single, double, hex and binary.
  • In the SDE console window, you have various lower level options for talking to SDE using the same command syntax we use with GDB on Linux.  Open this window by choosing “DEBUG->Windows->Intel(R) SDE Console”.

Known issues and limitations of the Intel SDE debugging feature on Windows*:

  1. You cannot attach to running processes.
  2.  You cannot use the "Express" edition of MSVS; It does not support the required extensions. You cannot use windbg; It does not support the required APIs

 



Debugging Emulated Code on Linux*

In general you need two windows to do this

For verion 5.31 and later: You must use gdb 7.4 (or later) which provides an XML protocol for communication between Intel SDE and gdb.


In window #1:

% sde -debug -- yourapp

In window #2

% gdb yourapp


Then from within gdb, you issue a "target remote :portnumber" where "portnumber" is the number shown in window #1. Don't forget the colon before the port number.

Debugging emulated code is not yet supported on Windows*. The Pin team is working on it.

 

Debugging Example

In one window:

% sde-avx-external-2.94-2009-12-31-lin-intel64-and-ia32/sde -debug --
_mm256_cmp_ps.intel64.opt.vec.exe
Application stopped until continued from debugger
Start GDB, then issue this command at the (gdb) prompt:
target remote :39464

In another window:

% gdb _mm256_cmp_ps.intel64.opt.vec.exe
GNU gdb 6.8.50.20080304-cvs
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
(no debugging symbols found)
(gdb) target remote :39464
Remote debugging using :39464
(no debugging symbols found)
0x00000032ea300a80 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) break main
Breakpoint 1 at 0x400894
(gdb) c
Continuing.
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)

Breakpoint 1, 0x0000000000400894 in main ()
(gdb) x/20i $pc
0x400894 <main+4>: and rsp,0xffffffffffffff80
0x400898 <main+8>: push rbx
0x400899 <main+9>: sub rsp,0x78
0x40089d <main+13>: mov edi,0x3
0x4008a2 <main+18>: call 0x402ce0 <__intel_new_proc_init_G>
0x4008a7 <main+23>: vstmxcsr DWORD PTR [rsp]
0x4008ac <main+28>: or DWORD PTR [rsp],0x8040
0x4008b3 <main+35>: vldmxcsr DWORD PTR [rsp]
0x4008b8 <main+40>: xor eax,eax
0x4008ba <main+42>: mov rbx,rax
0x4008bd <main+45>: vmovss xmm0,DWORD PTR [rbx*4+0x605290]
0x4008c6 <main+54>: vmovss xmm1,DWORD PTR [rbx*4+0x6052b0]
0x4008cf <main+63>: call 0x4017c0 <__isunorderedf>
0x4008d4 <main+68>: test eax,eax
0x4008d6 <main+70>: jne 0x4008f5 <main+101>
0x4008d8 <main+72>: vmovss xmm0,DWORD PTR [rbx*4+0x605290]
0x4008e1 <main+81>: vucomiss xmm0,DWORD PTR [rbx*4+0x6052b0]
0x4008ea <main+90>: jne 0x4008f5 <main+101>
0x4008ec <main+92>: jp 0x4008f5 <main+101>
0x4008ee <main+94>: mov edx,0xffffffff

(gdb) break *0x400907
Breakpoint 2 at 0x400907
(gdb) c
Continuing.

 

Then we hit our breakpoint at an Intel AVX instruction. Since this gdb knows about these instructions, it was able to disassemble it:

 

Breakpoint 2, 0x0000000000400907 in main ()
(gdb) x/i $pc
0x400907 <main+119>: vmovups ymm0,YMMWORD PTR [rip+0x204981] # 0x605290 <s1>

GDB does not yet know how to fetch the YMM register values from Intel SDE. So we have added two commands to GDB to get access to Intel SDE's copy of the YMM registers.

This shows the YMM 0 register before the instruction writes to YMM0:

 

gdb) monitor yreg 0
ymm00: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_4512a000
int8: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 69 18 -96 0
int16: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17682 -24576
int32: 0 0 0 0 0 0 0 1158848512
int64: 0 0 0 1158848512
uint8: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 69 18 160 0
uint16: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17682 40960
uint32: 0 0 0 0 0 0 0 1158848512
uint64: 0 0 0 1158848512
float: 0, 0, 0, 0, 0, 0, 0, 2346
double: 0, 0, 0, 5.72547e-315

(gdb) si
0x000000000040090f in main ()

 

And after the single step, we see that the value has changed:

(gdb) monitor yreg 0
ymm00: 4512a000_42f60000_4925aa70_43e40000_47557677_43e2ac3e_45d0b2c5_45056559
int8: 69 18 -96 0 66 -10 0 0 73 37 -86 112 67 -28 0 0 71 85 118 119 67 -30 -84 62 69
-48 -78 -59 69 5 101 89
int16: 17682 -24576 17142 0 18725 -21904 17380 0 18261 30327 17378 -21442 17872 -19771
17669 25945
int32: 1158848512 1123418112 1227205232 1139015680 1196783223 1138928702 1171305157
1157981529
int64: 4977216461181681664 5270806338059108352 5140144804325403710 5030717344109127001
uint8: 69 18 160 0 66 246 0 0 73 37 170 112 67 228 0 0 71 85 118 119 67 226 172 62
69 208 178 197 69 5 101 89
uint16: 17682 40960 17142 0 18725 43632 17380 0 18261 30327 17378 44094 17872 45765
17669 25945
uint32: 1158848512 1123418112 1227205232 1139015680 1196783223 1138928702 1171305157
1157981529
uint64: 4977216461181681664 5270806338059108352 5140144804325403710 5030717344109127001
float: 2346, 123, 678567, 456, 54646.5, 453.346, 6678.35, 2134.33
double: 5.62906e+24, 2.41581e+44, 4.45764e+35, 2.06715e+28

 

To print all the YMM registers in hex:

(gdb) monitor yregs
ymm00: 4512a000_42f60000_4925aa70_43e40000_47557677_43e2ac3e_45d0b2c5_45056559
ymm01: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_47559200
ymm02: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm03: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm04: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm05: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm06: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm07: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm08: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm09: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm10: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm11: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm12: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm13: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm14: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000
ymm15: 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000

 

One can also query Intel SDE to disassemble things using XED. This request a disassembly of 70 bytes at the indicated address:

 

(gdb) monitor xdis 0x40090f 70
XDIS 000000000040090f: AVX C5FCC20D9849200000 vcmpps ymm1, ymm0, ymmword ptr [rip+0x204998], 0x0
XDIS 0000000000400918: AVX C5FC110D60512000 vmovups ymmword ptr [rip+0x205160], ymm1
XDIS 0000000000400920: BASE 33FF xor edi, edi
XDIS 0000000000400922: BASE BE84354000 mov esi, 0x403584
XDIS 0000000000400927: BASE E8340D0000 call 0x401660
XDIS 000000000040092c: BASE 33C0 xor eax, eax
XDIS 000000000040092e: BASE 4889C3 mov rbx, rax
XDIS 0000000000400931: AVX C5FA10049D90526000 vmovss xmm0, dword ptr [rbx*4+0x605290]
XDIS 000000000040093a: AVX C5FA100C9DB0526000 vmovss xmm1, dword ptr [rbx*4+0x6052b0]
XDIS 0000000000400943: BASE E8780E0000 call 0x4017c0
XDIS 0000000000400948: BASE 85C0 test eax, eax
XDIS 000000000040094a: BASE 751B jnz 0x400967
XDIS 000000000040094c: AVX C5FA10049DB0526000 vmovss xmm0, dword ptr [rbx*4+0x6052b0]


Intel AVX/SSE transition checking

It is recommended that a VZEROALL or a VZEROUPPER be inserted between code that uses Intel® SSE and code that uses 256b Intel AVX instructions. Intel SDE can check for Intel SSE instructions followed by Intel AVX instructions without an intervening zeroing instruction, and vice versa.

  • Use the "-ast" Pin tool knob

  • Use the "-oast filename" to specify a filename other than "avx-sse-transition.out" . When using the "sde" driver -oast implies -ast.

path-to-kit/sde -oast filename.out -- user-application [args]

Example:

Command:

sde -ast -- mm_256_cmpouunord_ps.opt.vec.exe

Output: in "avx-sse-transition.out"

Dynamic Dynamic

AVX to SSE SSE to AVX Static Dynamic

BlockPC Transition Transitio Icount Executions Icount

================ ============ ============ ======== ========== ========

# TID 0

400993 1 0 16 1 16

4009f2 6 6 4 6 24

4009da 7 7 4 7 28

# SUMMARY

# AVX_to_SSE_transition_instances: 14

# SSE_to_AVX_transition_instances: 13

# Dynamic_insts: 147841

# AVX_to_SSE_instances/instruction: 0.0001

# SSE_to_AVX_instances/instruction: 0.0001

# AVX_to_SSE_instances/100instructions: 0.0095

# SSE_to_AVX_instances/100instructions: 0.0088

In this case, the program counter locations implicated the isnan() calls in the following code:

 


 

source1 = _mm256_loadu_ps(s1);

source2 = _mm256_loadu_ps(s2);

dest=_mm256_cmpunord_ps(source1,source2);

_mm256_storeu_ps((float*) d, dest);

for (i = 0; i < 8; i++) {

if (isnan(s1[i]) || isnan(s2[i])) {

e[i] = -1;

}

else {

e[i] = 0;

}

}

 

Using the xed tool for disassembly:

4009b3: AVX AVX C5FC1005FD381000 vmovups ymm0, ymmword ptr [rip+0x1038fd]

4009bb: DATAXFER BASE 89052B391000 mov dword ptr [rip+0x10392b], eax

4009c1: DATAXFER BASE 89052D391000 mov dword ptr [rip+0x10392d], eax

4009c7: AVX AVX C5FCC20D0839100003 vcmpps ymm1, ymm0, ymmword ptr [rip+0x103908], 0x3

4009d0: AVX AVX C5FC110D20391000 vmovups ymmword ptr [rip+0x103920], ymm1

4009d8: LOGICAL BASE 33C0 xor eax, eax

4009da: AVX AVX C5FA1080B8425000 vmovss xmm0, dword ptr [rax+0x5042b8]

4009e2: LOGICAL BASE 33D2 xor edx, edx

4009e4: SSE SSE 0F2EC0 ucomiss xmm0, xmm0

4009e7: COND_BR BASE 7B05 jnp 0x4009ee

4009e9: DATAXFER BASE BA01000000 mov edx, 0x1

4009ee: LOGICAL BASE 85D2 test edx, edx

4009f0: COND_BR BASE 7518 jnz 0x400a0a

4009f2: AVX AVX C5FA1080D8425000 vmovss xmm0, dword ptr [rax+0x5042d8

4009fa: LOGICAL BASE 33D2 xor edx, edx

4009fc: SSE SSE 0F2EC0 ucomiss xmm0, xmm0

4009ff: COND_BR BASE 7B05 jnp 0x400a06

400a01: DATAXFER BASE BA01000000 mov edx, 0x1


Code Sample: AES-128 Encryption and Decryption Routines

This sample code provides a set of C routines that demonstrate encryption and decryption routines using AES-128 in ECB mode. Read the End User License Agreement before you download the code samples.

 


Frequently Asked Questions

Q: How do I download Intel SDE?
A: Intel SDE is available on the Download Page.

Q: How do I ask questions and get support for Intel SDE?
A: The Intel® Software Development Emulator Forum has been set up to address questions. Intel engineers will be monitoring and available to answer user questions.
Q: How do I ask questions and get support for Intel® AVX and CPU instructions?
A: The Intel® AVX and CPU Instructions Forum has been set up to address questions. Intel engineers will be monitoring and available to answer user questions.

Q: What are the system requirements?
A: Intel SDE will run on IA-32 or Intel® 64 processors running Windows* or Linux* or OS X* operating systems.  OS X* is first supported in version 5.31 and only on Snow Leopard (10.6) and Lion (10.7). 

Q: Is there anything special I must do to get Intel SDE to run on OS X*?

A: The first version that supports OS X* is Intel SDE version 5.31. It supports versions 10.6 and 10.7 of OS X*.

Version 5.38 adds support for code signing. The pin executable is signed with an Intel code signing certificate, requiring the use of the "system.privilege.taskport" priviledge. The OS requires that the user provide consent allowing pin to use this interface.  If you are working on the graphical environment of the OS, you will be prompted for your username and password.  If you are working from a terminal without access to the graphical environment, you can do the following:

% security authorize -l -c system.privilege.taskport

Version 5.31  requires special permissions to execute. After unpacking the tar file from the download page you must do the  following to get OS X to allow it to execute:

 % sudo chgrp procmod kitname/i*/pinbin
 % sudo chmod g+s  kitname/i*/pinbin

where "kitname" is the unpacked directory name. We are working on a way to eliminate this requirement.

Q: What are the CPUID requirements?
A: Pin, and thus Intel SDE requires a Pentium 4 or later processor.

Q: How can I see symbols on windows?
A: Yes, for version 4.29 onwards.  For earlier versions, you need to have two copies of dbghelp.dll version 6.11.1.404 on your system in the right places for Pin to find it. You can get it from Microsoft here: http://www.microsoft.com/whdc/devtools/debugging/default.mspx . Pin, upon which Intel SDE is built, requires that the IA-32 and Intel64 architecture versions of dbghelp.dll version 6.11.1.404 be placed in the kit directory in the "ia32" and "intel64" subdirectories.

Q: What about running IA-32 architecture applications on Intel® 64 processor platforms?
A: This is supported.

Q: What about precise SSE exception handling in MXCSR?
A: Do not rely on Intel SDE to correctly set the MXCSR exception flags.

Q: What happens when my program dereferences inaccessible memory for emulated instructions?
A: Intel SDE will crash. You can use "sde -trace-execute -- your app" to get dump of the instructions executed in your task to find out what instruction was last executed. Then you can use "sde -debugtrace - your app" to look for the last write to the registers involved in the effective address computation for that last executed instruction. We are working on a better solution.

Q: Does Intel SDE handle cygwin /cygdrive/c paths or symlinks?
A: No, because Pin does not handle them, Intel SDE cannot handle them.

Q: Where can I learn more about Pin?
A: http://pintool.org and the yahoo group "pinheads". The release notes for Pin are available here. It provides additional information and restrictions about using Intel SDE, which is built on Pin.

Q: Where can I learn more about mix and debugtrace?
A: The basic sources for debugtrace and mix are available in Pin kits on the Pin website. I've modified them slightly to invoke Intel SDE and to print out the new YMM registers.

Q: What are the licensing terms?
A: See the download page.

Q: I see you are shipping "GPL with runtime exception" runtime libraries. Where can I find more information?
A: In order to comply with the license requirements of the Linux* runtime libraries used by Intel SDE, we provide the following sources:

For version 3.09 and prior:

Local download link for the unmodified gcc 4.2.0 or http://ftp.gnu.org/pub/gnu/gcc/gcc-4.2.0/gcc-4.2.0.tar.bz2

For version 3.88 ... 4.x: Local download link for unmodified gcc 4.5.1, libelf-0.8.5, and libdwarf-20070525.

For version 5.31 and later: Local download link for unmodified gcc-4.6.3, libelf-0.8.13, libdwarf-20111214

Licenses for runtime libraries for Intel SDE on Linux*

Q: Is there a version of GCC available that supports AES and PCLMULQDQ?
A:Yes, GCC 4.4 includes support for these instructions. Snapshots are available here: ftp://gcc.gnu.org/pub/gcc/snapshots/

Q: Is there a version of GCC available that supports Intel AVX?

A: GCC 4.5 supports the initial Intel AVX instructions present on Intel® microarchitecture code name Sandy Bridge.

GCC 4.7.x supports the Intel AVX2, BMI1/2 and RTM instructions.

Q: Is there a version of the Intel® Compiler available that supports Intel® AVX?
A: Yes, the current Intel Compiler supports the Intel AVX instructions. This version also includes support for Intel® SSE4, AES and PCLMULQDQ instructions. To use the post-32nm new instructions for the processor codenamed Ivy Bridge, it is required that you use Intel® Parallel Composer 2011 Update 2 or Intel® Composer XE 2011 Update 2. The compiler version is 12.0.2.x.

 

Q: How can I get Intel SDE to work on Ubuntu 10 or later?

A: There is a known problem of using Intel SDE on Linux* systems that prevents the use of ptrace attach via the sysctl /proc/sys/kernel/yama/ptrace_scope. In this case Pin is not able to use its default (parent) injection mode. To resolve this, execute the following echo command as root. (SDE does not need to run as root.)

$ echo 0 > /proc/sys/kernel/yama/ptrace_scope



Primary Technology Contact

Mark Charney
Principal Engineer in Intel's Software and Solutions Group. Mark joined Intel in 2002. Mark works on emulation of new instructions in support of the compiler and architecture teams. Mark also works on decoding and encoding instructions in software in his XED project. XED is used by Pin and other software tools within Intel and is available externally on the Pin web site. Mark has a PhD from Cornell University and a BSE from Princeton University.

 

Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.