what is "mov DWORD PTR [ebp-4], small constant"

what is "mov DWORD PTR [ebp-4], small constant"

The compiler is generating a LOT of those instructions, usually many of them in a row with different small constants.

Why is it doing that, and how do I make it not do that?

Background:

I'm trying to move a large application from version 7.1 of the compiler to version 9.0. The .exe and each .dll is over twice as large when build with 9.0 as it was with 7.1. Performance varies with different tasks, but typically is 30% slower for the 9.0 version than the 7.1 version.

I tried using Vtune to find the problem with the 9.0 built exe, but Vtune always crashes with a buffer overflow at the point you attempt to drill from the .exe into the list of subroutines. (With the 7.1 build, I've seen similar Vtune crashes on occasion, but with the 9.0 build Vtune crashes every time).

I tried generating asm output (/FAcs switch) and examining it to see what is wrong. That is where I see the massive number of instructions that are the topic of this question:

mov DWORD PTR [ebp-4], 0
followed directly by
mov DWORD PTR [ebp-4], 1
mov DWORD PTR [ebp-4], 2
etc.
Not always in sequence by value, but usually.

I don't even know for sure that this is a significant part of the 30% slow down I'm trying to fix. But looking around the code it seems to be around half the total generated instructions, so I expect it is a big part of the .exe being over twice as large.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

After more testing, I have more of a guess of what's going on, but not yet enough info.

The switches /GX or /EHs cause those instructions to be added.

I'm not yet clear on the distinction /EHs vs. /EHa vs. neither. I think /GX means the same as /EHs (anyway either adds these instructions).

My application is heavily templated. The code is full of apparent calls to constructors and destructors and other functions, which turn out to be nothing once the compiler knows the actual types involved.

The 7.1 compiler almost always optimized away all the layers of such things (a call to a call to ... to nothing). I expect the 9.0 compiler also optimizes away all the layers of call (as well as the nothing at the bottom). But I think it adds code to leave state information for the exception handler, to tell it which empty objects without destructors would need to be destroyed in case an exception occurs during each chunk of the code that has been completely optimized away.

If I'm right, the compiler is doing three absurd things:
1) Keeping track of whether a destructor needs to be called, when that destructor optimizes into zero actual code, and doesn't actually get called anyway.
2) Tracking the state in case of exception during empty constructors and/or other empty code. If the generated code is zero instructions long, it can't throw an exception.
3) After optimizing away all the apparent code, not seeing that almost every "mov DWORD PTR [ebp-4], n" is made completly redundant by the immediately follwing "mov DWORD PTR [ebp-4], m".

Is there any way to make the compiler smarter about such things?

Of course, I'm also investigating whether no /EH switch or /EHa is a solution. Under 7.1 the application needed /GX for correct operation and /GX did not seem to do the massive harm that it does under 9.0. It will be a long process (compile and test time) for me to find out whether no /EH switch or /EHa gives correct operation and whether it fixes the speed problem of 9.0. So far it looks like no /EH switch fixes quite a lot of size problem of 9.0.

We also have a Linux build of the same application, which I'd also like to move to the newer compiler, preferably improving rather then wrecking its performance. I expect exception handling is different enough that whatever I figure out for the Windows version won't tell me a thing about what to do with the Linux version.

Things are moving from bad to worse:

With none of the switches (/GX, /EHs or /EHa) I can't build the application at all (which makes sense, since some of it does use C++ exceptions).

With /EHa, I see all the redundant "mov DWORD PTR [ebp-4],n" instructions I saw with /EHs, but something further (that I haven't identified) is wrong.

The .exe built with /EHa is nearly twice the size of the .exe built with /EHs and is significantly slower. (The /EHa one is four times the size of the one built with the 7.1 compiler with /GX).

I expect I could get rid of some of this garbage by adding empty throw() modifiers to the declarations of thousands of functions.

However, there are so many of these problem mov's in a row in the places I've looked that I doubt the problem is limited to functions that actaully have declarations.

The templating includes many levels of declarations such as:

typedef some_template:: value_type;

(which gives you a name in the current templated scope for the type of the underlying objects being manipulated).

The compiler seems to stack up those declarations to construct absurdly long typenames. The actual type is usually something simple such as a pointer, a double or a std:size_t. But the compiler carries around absurd type names for them.

The only way I can explain so many extra mov's is if the compiler carries around info about the default construction and destruction of all of those within each template instantiation, and only realizes they have no construction and destruction AFTER it has generated extra code for exception handling around that construction and destruction.

For our own trivial types, I could add throw() modifiers to the empty contructors and other empty methods. But I can't do anything about the built in behavior of pointers, doubles and std::size_t.

Nothing in the generated asm code tells you which zero length function has been inlined and optimized out between each of the lines generated for exception protection, so I'm just guessing blindly.

Message Edited by john_fine on 03-19-200605:26 AM

Login to leave a comment.