Introduction
By James Rose
Sr. Application Engineer
CSD/AET Client Enabling Technology
The Streaming SIMD Extensions 3 instructions (also known as SSE3) add important new capabilities to the Intel® Pentium 4 E processor (code-named Prescott). Currently SSE3 is supported by the Intel® C++ compiler 8.0, but you may still require a build environment from other compilers such as Microsoft Visual Studio* 6.0, .Net 2002 or .Net 2003. Fortunately, you can include SSE3 assembly instructions in optimized functions in your application with support from either the Microsoft Macro Assembler (MASM) or the freeware Netwide Assembler (also known as NASM). In this paper, I’ll describe the SSE3 support offered by MASM and NASM and describe how you can convert source code to assembly that you can then use with SSE3.
Current Compiler Support for SSE3
Currently, SSE3 is supported by the Intel® C++ Compiler version 7.0 or greater for assembly instructions only with additional support for SSE3 intrinsics and assembly in version 8.0. Microsoft’s release of .Net 2005 (code-named Whidbey) will support SSE3, but as the name indicates it is slated for release sometime in the middle of 2005. Sometimes upgrading to another compiler is difficult because of the QA effort involved or other factors. Even though you may not be able to upgrade to the latest compilers, your application can still support SSE3 if you are willing to port some functions to assembly, optimize them using SSE3, and use either the MASM or NASM to assemble them. The next two sections describe the MASM and NASM assemblers and the support they provide for SSE3.
Support for SSE3 with the Microsoft Macro Assembler
MASM is the acronym for the Microsoft Macro Assembler* (ml.exe), and it is a standard tool on Microsoft Visual Studio 6*, .Net 2002* and .Net 2003*. MASM doesn’t support SSE3 natively, but you can use a macro include file called 'ia_pni.inc' which contains definitions for SSE3 instructions. This file is included in Appendix A at the end of this document. This file allows you to include SSE3 assembly instructions in functions optimized for SSE3.
Support for SSE3 with Netwide Assembler
NASM is an 80x86 assembler that supports a range of object file formats including:
- Linux* a.out and ELF
- COFF
- Microsoft 16-bit OBJ and Win32*
NASM is also freeware under the GNU Lesser General Public License, also known as LGPL. NASM version 0.98.36 provides native support for SSE3. You may opt to use NASM instead of MASM if you want to target platforms other than Win32 such as Linux, if you are already using NASM in your build process or if you don’t already use Microsoft compilers.
Basic Source to Assembly Conversion Process
&nb sp;After you have identified functions which are candidates for SSE3 optimization, a straightforward way to convert them to assembly is to produce assembly listing files from the compiler. Once you’ve done that, you can modify them to include SSE3, clean up some of the extraneous comments and other data and finally add the new assembly files to the build. The next sections provide more details about how you can convert your C++ functions to MASM or NASM assembly.
Source to MASM Assembly Conversion
Here are the basic steps to convert a C or C++ file into MASM assembly code inside the Microsoft Visual Studio .Net 2003 IDE*. (Conversions for Visual Studio* 2002 is nearly identical, and there are also only minor menu navigational differences for Microsoft Visual Studio C++ 6*):
- First, isolate functions that you intend to optimize with SSE3 instructions into a separate C/C++ file (or multiple files as necessary).
- Depending on the optimizations that you are targeting, it is usually beneficial to first optimize functions using SSE2 or MMX intrinsics, particularly if you plan on doing SIMD operations. Doing so can help make SSE3 optimization more straightforward after the function has been converted to assembly. Please refer to the Microsoft Visual Studio MSDN documentation included with the Microsoft compiler for more details about optimizing using intrinsics.
- Generate assembly output from the compiler for functions that will be optimized to contain SSE3 instructions. All recent versions of Microsoft Visual Studio* include the ability to output source code in MASM assembly format. To get MASM compatible assembly, in the IDE select the file that contains functions to be optimized with SSE3, then select: File->Properties->C++->Output Files->Assembler Output->Assembly-Only Listing (/FA)
- Clean up file as desired. The assembly code generation process typically includes a great deal of branch prediction information, extraneous line numbers and other non-referenced labels at the beginning and end of all basic blocks. This information can be removed without affecting the functionality of the assembly code. Note that referenced branch labels must not be removed from the assembly file.
- Since you will be using SSE3 instructions that aren’t natively supported in the MASM assembler, make sure that you add the directive 'include ia_pni.inc' at the top of the file and make sure that the ia_pni.inc file is in a path that can be located by MASM. This file can be found in Appendix A at the end of this document.
- Modify your code to include SSE3 instructions. The custom build step generates a buildlog.htm file that can be used to determine assembly syntax errors or determine other assemble-time issues.
- Comment out the old C/C++ function from the C/C++ file to avoid duplicate references to the original and new assembly optimized functions. It’s probably a good idea to keep the original source so that you have a reference to the source from which the assembly was generated.
- To add the .asm file to the build:
- Select File->Properties->Custom Build Step->General
- Command Line: ml /Zi /Cx /c /coff /Fl$(IntDir)$(InputName).lst /Fo $(IntDir)$(InputName).obj $(InputPath)
- Description: Assembling $(InputName)
- Outputs: ."$(IntDir)"$(InputName).obj
For further information about MASM or you have syntax problems, consult the documentation for MASM version 6.1*. Of course, if you would rather write your functions by hand, native MASM assembly can also be written.
Source to NASM Assembly Conversion
In general, you can follow the same steps for the assembly conversion process for MASM to get NASM assembly file. Note that some NASM syntax is different.; Some directives are different or even unnecessary.
Here are the basic steps to convert a C or C++ file into NASM assembly code inside the Microsoft Visual Studio .Net 2003 IDE. (Conversions for Visual Studio 2002 is nearly identical, and there are also only minor menu navigational differences for Microsoft Visual Studio C++ 6):
- First, isolate functions that you intend to optimize with SSE3 instructions into a separate C/C++ file (or multiple files as necessary)
- Depending on the optimizations that you are targeting, it is usually beneficial to first optimize functions using SSE2 or MMX intrinsics, particularly if you plan on doing SIMD operations. Doing so can help make SSE3 optimization more straightforward after the function has been converted to assembly. Please refer to the Microsoft Visual Studio MSDN documentation included with the Microsoft compiler for more details about optimizing with intrinsics.
- Generate assembly output from the compiler for functions to contain SSE3 instructions. All recent versions of Microsoft Visual Studio include the ability to output source code in MASM assembly format. To get MASM compatible assembly, in the IDE select the file that contains functions to be optimized with SSE3, then select File->Properties->C++->Output Files->Assembler Output->Assembly-Only Listing (/FA)
- Clean up file as desired. The assembly code generation process produces a great deal of extra information such as branch targets, line number information, etc. that isn’t necessary for proper function of the assembly. This information can be removed without affecting the functionality of the assembly code. Note that for NASM some of the MASM directives are unnecessary, such as XMMWORD PTR and in many cases that NASM syntax is simpler. For more information, please consult the MASM documentation* referenced earlier.
- Comment out the old C/C++ function from C/C++ file to avoid duplicate references to the original and optimized functions. It’s probably a good idea to keep the original source so that you have a reference for how the assembly was generated.
- Modify your code to include SSE3 instructions. The custom build step generates a buildlog.htm file that can be used to determine assembly syntax errors or determine other assemble-tim e issues.
- To ensure that you have SSE3 support with NASM, download NASM version 0.98.36 or later*. It is helpful to install nasm.exe to the compiler binary directory $(MSVCInstallDir)/vc7/bin(in the same place as cl.exe and ml.exe) to avoid many path related runtime problems.
- Add .nasm file to the build:
- Select file->Properties->Custom Build Step->General
- Command Line: nasm -f win32 -DPREFIX -o "$(IntDir)$(InputName).obj" "$(InputPath)"
- Description: Assembling $(InputName)
- Outputs: ."$(IntDir)"$(InputName).obj
If you have syntax problems, consult the NASM documentation*. And of course if you would rather forgo the conversion process itself, direct NASM assembly can also be written.
Summary
Even if you cannot migrate to the latest compilers, you can use SSE3 instructions now with the VC6, .Net 2002 and .Net 2003 compilers using assembly files for MASM or NASM. Don’t put off using the capabilities offered by SSE3 because you don’t have the latest compilers. If you are willing to convert functions that you want to optimize for SSE3 to MASM or NASM compatible assembly you can enjoy the benefits of SSE3 in your application today with previously released compilers.
Appendix A - SSE3 Macro Definitions for Use with Microsoft Macro Assembler
; ia_pni.inc MASM Macro definitions for Streaming ; SIMD Extensions 3 ; ; THIS SOFTWARE AND DOCUMENTATION IS PROVIDED ; "AS IS" WITH NO WARRANTIES WHATSOEVER, ; INCLUDING ANY WARRANTY OF MERCHANTABILITY, ; NON-INFRINGEMENT, FITNESS FOR ANY PARTICULAR ; PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT ; OF ANY PROPOSAL, SPECIFICATION OR SAMPLE. ; Intel® disclaims all liability, including ; liability for infringement of any proprietary ; rights, relating to use of information in this ; software. No license, express or implied, ; by estoppel or otherwise, to any intellectual ; property rights is granted herein. Intel ; retains the right to make changes to its ; software and documentation at any time, ; without notice. ; The software vendor remains solely responsible ; for the design, sale, and functionality of its ; product, including any liability arising from ; product infringement or product warranty ; of any kind. ; Copyright (c) 2003 Intel Corporation. ; All rights reserved. .686P .xmm ; This macro package req uires an assembler vesion ; 6.15.8803 or later. ; Please use XMMWORD and not DWORD (OWORD does ; not work) for 128 bit data in Streaming SIMD ; Extensions 2 instructions. After getting a real ; assembler you will just have to add the line ; "XMMWORD TEXTEQU “<OWORD>" ; to your code. |
|
