Debugging SSE/SSE2 ?

Debugging SSE/SSE2 ?

Hi,

Too many times in the past I've gotten bug reports from users with systems without SSE2 support, because I had unknowingly used an SSE2 instruction somewhere.

Problem is:
-my compiler (Delphi) can't tell me where I've used an SSE2 instruction
-I can't find anything in my BIOS to disable SSE2 (apparently some have such features?)
-it's hard to rely on users for debugging things efficiently (making sure they could unzip files properly, etc)
-I don't have a time machine to go back to prehistory where non-SSE2 systems still existed

Anyone knows a software solution to efficiently debug this?

My #1 suspect is ANDNPS. I hesitated using it, as AMD's docs say it's SSE2, but pretty much everywhere on the net (even Wikipedia)it's said to be SSE1. ANDPS, (X)ORPS are all SSE1.
I swear that "AMD64 Architecture Programmers Manual Volume 4: 128-Bit Media Instructions" claims it's SSE2.

However I just got rid of it and I'm still reported it doesn't work. Is ANDNPS really SSE2? (& thus is it another useless instruction released in the same pack as PANDN that does the same thing?)

Does Intel have a reliable list of instruction sets somewhere?

Edit: I found "Intel 64 and IA-32 Architectures Software Developers Manual Volume 1: Basic Architecture" that says that ANDNPS is an SSE1 instruction.
But also this website that lists it as an SSE2: http://softpixel.com/~cwright/programming/simd/sse2.php

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You've found the authoritative documentation on Intel CPUs. Other sites would be better suited to discussions about Delphi or specific instruction groups supported by non-Intel CPUs of long ago.
There's a companion forum about instruction sets, but none on historical questions about specific CPUs. As far as Intel CPUs are concerned, as the first CPU with SSE was Pentium III, any instruction supported by that CPU is SSE. It was simply impractical to support CPUs which reported SSE support, without supporting all those instructions, nor the later model which reported SSE2 support but with a few missing instructions.
It has been several years since Intel software attempted full support for P-III, let alone CPUs with partial SSE or SSE2 support; those are treated as non-SSE, under the ia32 instruction set compiler option. It seems that would be what you meant by "reliable." If the documentation you seek wasn't provided by the maker of the CPUs in question, you must understand that it's impossible on several levels for someone else to claim authority to fill the need you perceive.
Although Intel compilers have supported automatic selection of instruction sets at run time for nearly as long as the period you are discussing, that still isn't reliable to the extent that anyone is able to perform sufficient testing to assure it will work on all customer CPU types. If Delphi support forums can't tell you whether equivalent functionality is supported there, it seems the "reliable" method is to avoid such dependencies as much as possible.
Yes, there are instances where its preferable on a current CPU to avoid certain instructions which might have been desirable on some past CPU. For example, compilers for SSE4.2 have cut back on usage of SSSE3. Nevertheless, it's assumed that any CPU which claims support for some instruction set can take care of all such instructions.

There is an emulator SDE (http://software.intel.com/en-us/articles/intel-software-development-emulator/) that you can use to find out which SSE instructions are in the application. it gives very good histogram of all ISA instructions. You are just suspicious about just one instruction but as soon as you have fixed that one you may hit with another one. With this tool you will know how many SSE1, SSE2 or SSSE3 instruction in your application. if you play with some switches you may able to get more information on the application and these instructions.
to get instruction mix run following command:
sde -mix --

it will create a text file i beleive mix.out in your directory and browse it.

Quoting - Brijender Bharti

There is an emulator SDE (http://software.intel.com/en-us/articles/intel-software-development-emulator/) that you can use to find out which SSE instructions are in the application. it gives very good histogram of all ISA instructions. You are just suspicious about just one instruction but as soon as you have fixed that one you may hit with another one. With this tool you will know how many SSE1, SSE2 or SSSE3 instruction in your application. if you play with some switches you may able to get more information on the application and these instructions.
to get instruction mix run following command:
sde -mix --

it will create a text file i beleive mix.out in your directory and browse it.

Just adding one more thing to your question, after you find the piece of the code using SSE instruction, please use CPUID check around that code/function to make sure it does not execute on the processors do not support that instruction sets. so, that your application does nt crash

It has been several years since Intel software attempted full support for P-III, let alone CPUs with partial SSE or SSE2 support; those are treated as non-SSE, under the ia32 instruction set compiler option.

If Delphi support forums can't tell you whether equivalent functionality is supported there

Mmh, do you mean that there are known systems in history that report SSE or SSE2 support, but without supporting the whole SSE or SSE2 instruction set? So one would have to check a black list of systems, instead of using the CPU's support flags?

Also, it's not really related to Delphi. Delphi's compiler can't even use SSE by itself (it's a prehistoric compiler thatstill insertsFWAITs everywhere, what would you expect..), I'm only talking about hand-coded assembler in a Delphi application, where the programmer is responsible for the proper use of instructions depending on what the CPU says to be supporting.

There is an emulator SDE

Thanks, will give a try asap.

& if I find nothing, my next suspect will be the IPP libraries, so I will try to tell the libraries to switch to the functions for only SSE1 support, it may be a slightly different result from an IPP function that's causing troubles.

Quoting - gol

& if I find nothing, my next suspect will be the IPP libraries, so I will try to tell the libraries to switch to the functions for only SSE1 support, it may be a slightly different result from an IPP function that's causing troubles.

It should be possible, running under debugger on the problematic CPU, to determine whether the fault occurs within an IPP function. SDE -omix also may show you which function executed the instruction in question. Are you running a version of IPP which attempts to distinguish SSE from SSE2 platforms at run time? I'd be concerned about its ability to do so. I would expect SDE to take the latest instruction set path in IPP, unless you hack the choice.

Quoting - tim18

Are you running a version of IPP which attempts to distinguish SSE from SSE2 platforms at run time?

I simply let IPP decide alone.But as I'm using a Delphi import I can't test right now by telling IPP to pick functions for a specific target, I will see later.

I don't really suspect a bug (there aren't that many inthe IPP libraryin general), but a difference of precision. Where I use low-precision IPP functions, on a non-SSE or SSE2 it will revert to the FPU, thus result in more precision, and maybe produce results I wasn't expecting.

I found the offending instruction, it was MOVD.

That's one weird instruction, I suppose that no one cares about it since it's the same as MOVSS. It also strangely doesn't appear in most SSE & SSE2 instructions lists I've checked. In fact, in my Intel doc that's listing all SSE/SSE2 instructions, MOVD is only mentionned as an MMX one!
My AMD docs does list it as an MMX and SSE2 one (when used with XMM registers).

My only reason to use MOVD was the lack of documentation about SSE's FTZ/DAZ feature. It's undocumented, so one of those "free to be interpreted as the CPU wants". Meaning that, while right now MOVSS does not care about FTZ/DAZ, and thus is suitable to move *integers*, a future CPU may interpret FTZ/DAZ as working on MOVSS, and get away with it. I really don't understand the lack of documentation here :(
(it's also often advised not to mix different types)

ANDNPS however appears to be SSE1, unlike what AMD says.

Best Reply

Quoting - gol

I simply let IPP decide alone.But as I'm using a Delphi import I can't test right now by telling IPP to pick functions for a specific target, I will see later.

I don't really suspect a bug (there aren't that many inthe IPP libraryin general), but a difference of precision. Where I use low-precision IPP functions, on a non-SSE or SSE2 it will revert to the FPU, thus result in more precision, and maybe produce results I wasn't expecting.

COOL. Good Job. But I still dont understand as you mentioned that Intel doc does not list it as SSE or SSE2 instruction. I am looking at intel manual online (http://www.intel.com/Assets/PDF/manual/253666.pdf) and it says that CPU should support SSE2 to execute XMM form otherwise instruction will UD. Page 3-678.
Both MOVD and MOVSS dont have SIMD floating pointing exceptions.

My bad, I was reading an older Intel doc that wasn't actually listing SSE2 instructions.

Nice PDF btw, more useful than that I had found so far.

Quoting - Brijender Bharti
Both MOVD and MOVSS dont have SIMD floating pointing exceptions.

Is that a guarantee that they will never be affected by FTZ nor DAZ btw (I suppose this is very unlikely, but who knows)? Or that there won't be an extra delay for mixing types (this is more likely, as I've heard of existing delays)?
(I remember I once had a similar discussion, maybe on this very forum, about MOVLPS vs MOVQ btw)

Edit: Reading that PDF, it looks by the definition of FTZ that it shouldn't apply to MOVSS. DAZ however could be interpreted differently. Anyway, I know it does not in practice, it's only to be on the safe side.

Leave a Comment

Please sign in to add a comment. Not a member? Join today