Streaming SIMD Extensions 3 Enabling for the Microsoft .NET* Compiler 2003

Tags:

Introduction


By James Rose
Sr. Application Engineer
CSD/AET Client Enabling Technology

The Streaming SIMD Extensions 3 instructions (also known as SSE3) add important new capabilities to the Intel® Pentium 4 E processor (code-named Prescott). Currently SSE3 is supported by the Intel® C++ compiler 8.0, but you may still require a build environment from other compilers such as Microsoft Visual Studio* 6.0, .Net 2002 or .Net 2003. Fortunately, you can include SSE3 assembly instructions in optimized functions in your application with support from either the Microsoft Macro Assembler (MASM) or the freeware Netwide Assembler (also known as NASM). In this paper, I’ll describe the SSE3 support offered by MASM and NASM and describe how you can convert source code to assembly that you can then use with SSE3.


Current Compiler Support for SSE3


Currently, SSE3 is supported by the Intel® C++ Compiler version 7.0 or greater for assembly instructions only with additional support for SSE3 intrinsics and assembly in version 8.0. Microsoft’s release of .Net 2005 (code-named Whidbey) will support SSE3, but as the name indicates it is slated for release sometime in the middle of 2005. Sometimes upgrading to another compiler is difficult because of the QA effort involved or other factors. Even though you may not be able to upgrade to the latest compilers, your application can still support SSE3 if you are willing to port some functions to assembly, optimize them using SSE3, and use either the MASM or NASM to assemble them. The next two sections describe the MASM and NASM assemblers and the support they provide for SSE3.


Support for SSE3 with the Microsoft Macro Assembler

 
MASM is the acronym for the Microsoft Macro Assembler* (ml.exe), and it is a standard tool on Microsoft Visual Studio 6*, .Net 2002* and .Net 2003*. MASM doesn’t support SSE3 natively, but you can use a macro include file called 'ia_pni.inc' which contains definitions for SSE3 instructions. This file is included in Appendix A at the end of this document. This file allows you to include SSE3 assembly instructions in functions optimized for SSE3.


Support for SSE3 with Netwide Assembler


NASM is an 80x86 assembler that supports a range of object file formats including:

  • Linux* a.out and ELF
  • COFF
  • Microsoft 16-bit OBJ and Win32*

NASM is also freeware under the GNU Lesser General Public License, also known as LGPL. NASM version 0.98.36 provides native support for SSE3. You may opt to use NASM instead of MASM if you want to target platforms other than Win32 such as Linux, if you are already using NASM in your build process or if you don’t already use Microsoft compilers.


Basic Source to Assembly Conversion Process

&nb sp;
After you have identified functions which are candidates for SSE3 optimization, a straightforward way to convert them to assembly is to produce assembly listing files from the compiler. Once you’ve done that, you can modify them to include SSE3, clean up some of the extraneous comments and other data and finally add the new assembly files to the build. The next sections provide more details about how you can convert your C++ functions to MASM or NASM assembly.


Source to MASM Assembly Conversion


Here are the basic steps to convert a C or C++ file into MASM assembly code inside the Microsoft Visual Studio .Net 2003 IDE*. (Conversions for Visual Studio* 2002 is nearly identical, and there are also only minor menu navigational differences for Microsoft Visual Studio C++ 6*):

  1. First, isolate functions that you intend to optimize with SSE3 instructions into a separate C/C++ file (or multiple files as necessary).
  2. Depending on the optimizations that you are targeting, it is usually beneficial to first optimize functions using SSE2 or MMX intrinsics, particularly if you plan on doing SIMD operations. Doing so can help make SSE3 optimization more straightforward after the function has been converted to assembly. Please refer to the Microsoft Visual Studio MSDN documentation included with the Microsoft compiler for more details about optimizing using intrinsics.
  3. Generate assembly output from the compiler for functions that will be optimized to contain SSE3 instructions. All recent versions of Microsoft Visual Studio* include the ability to output source code in MASM assembly format. To get MASM compatible assembly, in the IDE select the file that contains functions to be optimized with SSE3, then select: File->Properties->C++->Output Files->Assembler Output->Assembly-Only Listing (/FA)
  4. Clean up file as desired. The assembly code generation process typically includes a great deal of branch prediction information, extraneous line numbers and other non-referenced labels at the beginning and end of all basic blocks. This information can be removed without affecting the functionality of the assembly code. Note that referenced branch labels must not be removed from the assembly file.
  5. Since you will be using SSE3 instructions that aren’t natively supported in the MASM assembler, make sure that you add the directive 'include ia_pni.inc' at the top of the file and make sure that the ia_pni.inc file is in a path that can be located by MASM. This file can be found in Appendix A at the end of this document.
  6. Modify your code to include SSE3 instructions. The custom build step generates a buildlog.htm file that can be used to determine assembly syntax errors or determine other assemble-time issues.
  7. Comment out the old C/C++ function from the C/C++ file to avoid duplicate references to the original and new assembly optimized functions. It’s probably a good idea to keep the original source so that you have a reference to the source from which the assembly was generated.
  8. To add the .asm file to the build:
    • Select File->Properties->Custom Build Step->General
    • Command Line: ml /Zi /Cx /c /coff /Fl$(IntDir)$(InputName).lst /Fo $(IntDir)$(InputName).obj $(InputPath)
    • Description: Assembling $(InputName)
    • Outputs: ."$(IntDir)"$(InputName).obj

For further information about MASM or you have syntax problems, consult the documentation for MASM version 6.1*. Of course, if you would rather write your functions by hand, native MASM assembly can also be written.


Source to NASM Assembly Conversion


In general, you can follow the same steps for the assembly conversion process for MASM to get NASM assembly file. Note that some NASM syntax is different.; Some directives are different or even unnecessary.

Here are the basic steps to convert a C or C++ file into NASM assembly code inside the Microsoft Visual Studio .Net 2003 IDE. (Conversions for Visual Studio 2002 is nearly identical, and there are also only minor menu navigational differences for Microsoft Visual Studio C++ 6):

  1. First, isolate functions that you intend to optimize with SSE3 instructions into a separate C/C++ file (or multiple files as necessary)
  2. Depending on the optimizations that you are targeting, it is usually beneficial to first optimize functions using SSE2 or MMX intrinsics, particularly if you plan on doing SIMD operations. Doing so can help make SSE3 optimization more straightforward after the function has been converted to assembly. Please refer to the Microsoft Visual Studio MSDN documentation included with the Microsoft compiler for more details about optimizing with intrinsics.
  3. Generate assembly output from the compiler for functions to contain SSE3 instructions. All recent versions of Microsoft Visual Studio include the ability to output source code in MASM assembly format. To get MASM compatible assembly, in the IDE select the file that contains functions to be optimized with SSE3, then select File->Properties->C++->Output Files->Assembler Output->Assembly-Only Listing (/FA)
  4. Clean up file as desired. The assembly code generation process produces a great deal of extra information such as branch targets, line number information, etc. that isn’t necessary for proper function of the assembly. This information can be removed without affecting the functionality of the assembly code. Note that for NASM some of the MASM directives are unnecessary, such as XMMWORD PTR and in many cases that NASM syntax is simpler. For more information, please consult the MASM documentation* referenced earlier.
  5. Comment out the old C/C++ function from C/C++ file to avoid duplicate references to the original and optimized functions. It’s probably a good idea to keep the original source so that you have a reference for how the assembly was generated.
  6. Modify your code to include SSE3 instructions. The custom build step generates a buildlog.htm file that can be used to determine assembly syntax errors or determine other assemble-tim e issues.
  7. To ensure that you have SSE3 support with NASM, download NASM version 0.98.36 or later*. It is helpful to install nasm.exe to the compiler binary directory $(MSVCInstallDir)/vc7/bin(in the same place as cl.exe and ml.exe) to avoid many path related runtime problems.
  8. Add .nasm file to the build:
    • Select file->Properties->Custom Build Step->General
    • Command Line: nasm -f win32 -DPREFIX -o "$(IntDir)$(InputName).obj" "$(InputPath)"
    • Description: Assembling $(InputName)
    • Outputs: ."$(IntDir)"$(InputName).obj

If you have syntax problems, consult the NASM documentation*. And of course if you would rather forgo the conversion process itself, direct NASM assembly can also be written.


Summary


Even if you cannot migrate to the latest compilers, you can use SSE3 instructions now with the VC6, .Net 2002 and .Net 2003 compilers using assembly files for MASM or NASM. Don’t put off using the capabilities offered by SSE3 because you don’t have the latest compilers. If you are willing to convert functions that you want to optimize for SSE3 to MASM or NASM compatible assembly you can enjoy the benefits of SSE3 in your application today with previously released compilers. 


Appendix A - SSE3 Macro Definitions for Use with Microsoft Macro Assembler


; ia_pni.inc MASM Macro definitions for Streaming 
; SIMD Extensions 3
;
; THIS SOFTWARE AND DOCUMENTATION IS PROVIDED
; "AS IS" WITH NO WARRANTIES WHATSOEVER, 
; INCLUDING ANY WARRANTY OF MERCHANTABILITY, 
; NON-INFRINGEMENT, FITNESS FOR ANY PARTICULAR 
; PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT 
; OF ANY PROPOSAL, SPECIFICATION OR SAMPLE.

; Intel® disclaims all liability, including
; liability for infringement of any proprietary 
; rights, relating to use of information in this 
; software. No license, express or implied, 
; by estoppel or otherwise, to any intellectual 
; property rights is granted herein. Intel 
; retains the right to make changes to its 
; software and documentation at any time, 
; without notice.

; The software vendor remains solely responsible 
; for the design, sale, and functionality of its
; product, including any liability arising from 
; product infringement or product warranty 
; of any kind.

; Copyright (c) 2003 Intel Corporation.
; All rights reserved.


.686P
.xmm

; This macro package req
uires an assembler vesion 
; 6.15.8803 or later.

; Please use XMMWORD and not DWORD (OWORD does 
; not work) for 128 bit data in Streaming SIMD 
; Extensions 2 instructions. After getting a real
; assembler you will just have to add the line 
; "XMMWORD TEXTEQU “<OWORD>"
; to your code.

 

MM2WORD TEXTEQU <QWORD> ; used only by the compiler, obsolete
MMWORD  TEXTEQU <QWORD> ; used only by the compiler, obsolete
XMMWORD TEXTEQU <DWORD> ; 128 bit memory operands for xmm regs inst

opc_addsubpd = 0D0H
opc_addsubps = 0D0H
opc_fisttp16 = 0DFH
opc_fisttp32 = 0DBH
opc_fisttp64 = 0DDH
opc_lddqu    = 0F0H
opc_movddup  = 012H
opc_movshdup = 016H
opc_movsldup = 012H
opc_haddps   = 07CH
opc_haddpd   = 07CH
opc_hsubps   = 07DH
opc_hsubpd   = 07DH

pref_66                = 066H
pref_f2                = 0F2H
pref_f3                = 0F3H


emm_inst macro pref:req, op:req, dst:req, src:req
     local x,y
x:
     addpd           dst, src
y:
     org x
     byte pref
     org x+2
     byte op
     org y
endm

emm_66_op    macro    op:req,    dst:req,    src:req
     emm_inst   pref_66, op, dst, src
endm

emm_f2_op    macro    op:req,    dst:req,    src:req
     emm_inst   pref_f2, op, dst, src
endm

emm_f3_op    macro    op:req,     dst:req,   src:req
     emm_inst   pref_f3, op, dst, src
endm

; 66 0F D0 /r      addsubpd        xmm1,      xmm2/m128
addsubpd     macro   dst:req,      src:req
 emm_66_op     opc_addsubpd,       dst, src
endm

; F2 0F D0 /r      addsubps        xmm1,      xmm2/m64
addsubps     macro   dst:req,      src:req
 emm_f2_op     opc_addsubps,       dst, src
endm

; F2 0F F0 /r      lddqu           xmm,       m128
lddqu        macro   dst:req,      src:req
IF (OPATTR(src) AND 00010100y) ; register or constant
   .ERR <illegal operands. dst should be memory !>
ELSE
    emm_f2_op opc_lddqu, dst, src
ENDIF
endm

; F2 0F 12 /r     movddup          xmm1,      xmm2/m64
movddup      macro   dst:req,      src:req
 emm_f2_op   opc_movddup,          dst, src
endm

; F3 0F 16 /r     movshdup         xmm1,      xmm2/m128
movshdup     macro   dst:req,      src:req
 emm_f3_op   opc_movshdup,         dst, src
endm

; F3 0F 12 /r     movsldup         xmm1,      xmm2/m128
movsldup     macro   dst:req,      src:req
 emm_f3_op   opc_movsldup,         dst, src
endm

; F2 0F 7C /r     haddps           xmm1,      xmm2/m64
haddps       macro   dst:req,      src:req
 emm_f2_op   opc_haddps,           dst, src
endm

; 66 0F 7C /r     haddpd           xmm1,      xmm2/m128
haddpd       macro   dst:req,      src:req
 emm_66_op   opc_haddpd,           dst, src
endm

; F2 0F 7D /r     hsubps           xmm1,      xmm2/m64
hsubps       macro   dst:req,      src:req
 emm_f2_op   opc_hsubps,           dst, src
endm

; 66 0F 7D /r     hsubpd           xmm1,      xmm2/m128
hsubpd       macro   dst:req,      src:req
 emm_66_op   opc_hsubpd,           dst, src
endm

; DF /1                   fisttp m16int
; DB /1       
            fisttp m32int
; DD /1                   fisttp m64int
fisttp   macro adr:req
         local x, y
IF (OPATTR(adr)) AND 00010100y ; register or const
      .ERR <invalid operand. dst must be an address in memory!>
ELSEIF (TYPE (adr) EQ WORD) OR (TYPE (adr) EQ SWORD)
x:
     fimul    adr
y:
     org      x
     byte     opc_fisttp16
     org      y
ELSEIF (TYPE (adr) EQ DWORD) OR (TYPE (adr) EQ SDWORD)
     x:
        fimul adr
     y:
        org      x
        byte     opc_fisttp32
        org      y
     ELSEIF (TYPE (adr) EQ QWORD)
          x:
          fmul     adr
          y:
          org      x
          byte     opc_fisttp64
          org      y
     ELSE
           .ERR <invalid operand. dst can be 2,4 or 8 bytes address only!>
     ENDIF
endm

; 0F 01 C8    monitor
monitor macro
   byte 0x0f
   byte 0x01
   byte 0xc8
endm

; 0F 01 C8    mwait
mwait macro
   byte 0x0f
   byte 0x01
   byte 0xc9
endm

 


For more complete information about compiler optimizations, see our Optimization Notice.