Emulation of new instructions

Hello and welcome to my blog. This is my first blog posting.

My name is Mark Charney and I work at Intel in Hudson, Massachusetts. Intel has just made available some software that I've been working on for emulation of new instructions: Intel® Software Development Emulator, or Intel® SDE for short. Intel SDE emulates instructions in the SSE4, the AES and PCLMULQDQ instruction set extensions, and also the Intel® AVX instructions. Intel SDE runs on Windows* and Linux* and supports IA-32 architecture and Intel® 64 architecture programs.

Intel SDE is a functional emulator, meaning that it is useful for testing software that uses these new instructions to make sure it computes the right answers. Testing software that uses instructions that do not exist in hardware yet requires an emulator. Intel SDE is not meant for predicting performance.

Intel SDE is actually a "Pintool" built upon the Pin dynamic binary instrumentation system.. The Pin that comes with Intel SDE uses a special version of the software encoder decoder called XED that I also develop. While Intel SDE is primarily useful for learning about the new instructions, it also has some features for doing simple workload analysis. The "mix" tool compute static and dynamic histograms. It can compute histograms by the type of the instruction (ADD, SUB, MUL, etc.) or by "iforms" which are XED classifications of instructions that include the operands, or by instruction length.

Intel SDE is fairly speedy. I actually haven't measured it because it was so much faster than the other emulator we have been using (over 100x faster) that I'm not getting any complaints internally. We routinely run SPEC2006 using Intel SDE using the reference inputs. Most of the inputs can run in several hours while a few of the longer running inputs take about a day. Emulation performance is tricky to measure as each instruction requires a different amount of work and each application is different. I could probably take the slow down relative to a version of SPEC2006 that only used native instructions. The reason that Intel SDE is faster than our previous "trap-and-emulate" emulator basically has to do with the fact that we do not rely on the illegal-instruction exception saving 1000s of cycles dispatching and returning from the emulation routines. Because Intel SDE is built upon Pin, we can JIT-translate the original program and branch to the emulation routines, saving that exception overhead.

Right now, there are several ways that I know about to write programs using the new instructions. If you want to use the SSE4, AES and PCLMULQDQ instructions, then you can use the Intel® Compiler. The Intel Compiler supporting Intel AVX is expected to be released in the first quarter of 2009. GCC4.3 supports SSE4. There is also a version of GCC that supports AES and PCLMULQDQ available in the svn (subversion) respository svn://gcc.gnu.org/svn/gcc/branches/ix86/gcc-4_3-branch . GCC for Intel AVX is under development as well: svn://gcc.gnu.org/svn/gcc/branches/ix86/avx. GNU binutils which includes the "gas" assembler is available for AES, PCLMULQDQ and Intel AVX. Also available are the YASM and NASM assemblers.

If anyone has questions about this or suggestions for something they'd like me to write about, please post a comment. I'd like to hear about what is important to you. There are so many aspects of this that I'd like to describe in future postings:

    • How it works

    • Isolation issues

    • Debugging

    • Advanced use options

    • Program checkers



Also if you have software questions you can post them to the Intel® AVX and CPU forum at:
/en-us/forums/intel-avx-and-cpu-instructions/

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

Will AVX support be automatic in the Intel C and Fortran compilers? That is, if you set the correct command line options in the compile command, will the compiler automatically AVXify the executable, the way it currently does automatic SSE* vectorization?


(reposting; 1st one didn't show up)

The cpuid config file is a good suggestion. We use a file with 6 columsn (two inputs, 4 outputs) internally. I will put this on the list but cannot commit to implementing it ASAP.

Can you skip the warnings from chip-check with a post-processing script?

The -no-shared-libs knob only affects the mix histogramming. It should not cause crashes. Send me the details of that crash if you want.

Thanks for hte kind words about the emulator. Glad it is helping.

Regards,
Mark


Hi Mark,

I am wanting to emulate older processors to check for illegal instructions and am finding it tricky to do so using the SDE CPUID options. For example, for Pentium III I am using:

-core2_cpuid_baseline 1 -sse2_cpuid 0 -sse3_cpuid 0 -sse41_cpuid 0 -leaf1_cpuid_family 7 -chip_check PENTIUM3

Setting family = 7 is my way of preventing Intel compiler 11.1 dispatching to SSE2 code (just sse2_cpuid=0 doesn't seem to be enough).

I'm 90% sure I could be setting these options better but it strikes me that it might be useful to make CPUID emulation completely configurable. For example, what about an option like -cpuid_file file.txt where the utility reads a text file compatible with the CPUZ format, e.g. containing lines like this for PIII:

CPUID
0x00000000 0x00000003 0x756E6547 0x6C65746E 0x49656E69
0x00000001 0x00000673 0x00000000 0x00000000 0x0387F9FF
0x00000002 0x03020101 0x00000000 0x00000000 0x0C040843
0x00000003 0x00000000 0x00000000 0x11814197 0x0001C236

Just an idea - would be incredibly useful for developers like me trying to track down illegal instructions bug reports on older machines that we don't have in the lab.

A second comment on chip-check: it would be useful to be able to filter chip-check errors to just certain modules. On my emulated PIII I am seeing SSE2 used within system and GPU driver DLLs and I want to ignore these errors. It would be enough to print the guilty module name in each line of chip-check's output. -no_shared_libs probably is no good because I want to check some DLLs and anyway, adding this option causes the tool to crash for me.

Finally, it would be nice if you could find a way to allow users to set up configurations for non-Intel CPUs in chip-check.

Many complements and thanks for this tool - it is a superb emulator.


Hi Chris,

Currently we do not release the libraries/headers for making SDE-enabled pintools. SDE is itself a pintool and sometimes it is a little tricky to compose distinct pintools. SDE uses pin virtual registers for example so it is composed with another pintool that uses pin virtual registers, both tools must use the proper pin API for allocating those.

Was there a specific feature you were looking for that might be generally useful?

(Sorry for the delay in responding. I did not get notified about your comment.)

Regards,
Mark



Mark Great to see that you have posted an update to the Software Development Emulator:
http://software.intel.com/en-us/articles/intel-software-development-emulator/
What are the big benefits to the update?


Hi Mukesh,

The license terms for the emulator are available on the download
page.

For GCC, the latest binutils 2.19.51.0.1 was released to support the
latest AVX architecture specification. It is available at this site:
http://www.kernel.org/pub/linux/devel/binutils. GCC 4.4 revision
143117 supports the same AVX architecture specification.

The first version of the Intel Compiler to support AVX will be available in early 2009.

Regards,
Mark


Hi.

I would feel proud in exploring AVX behaviour with some scientific-applications. I have some more queries -

(a) Could I EVAL version for Intel AVX Simulator. If YES, this EVAL version would be for how many days?

(b) Can I simulate AVX with GNU Compiler Toolchain(GCC - 4.4, Binutils-2.18.50.0.9, & Glibc-2.7)?

(c) I do have Intel Compilers(v-10.0), can I have Intel AVX library support on this Intel-v10.0 compiler toolchain?
Also, we have Clovertown processor from Intel.

Looking forward.

~BR
Mukesh