Cross-compiling for IA32 on Windows 7 64-bit to avoid "out of memory"

Cross-compiling for IA32 on Windows 7 64-bit to avoid "out of memory"

Bild des Benutzers fgp.phlo.org

Hi

I'm trying to build a heavily templated application for IA32 with the Intel C++ Compiler for Windows, version 12.1.5.344, running on 64-bit Windows 7.  Unfortunately, however, the IA32-targetting icl.exe (and mcpcom.exe) seem to be 32-bit binary, and errors out after trying to allocate more than 4GB (which is obviously impossible for a 32-bit binary).

Is there a 64-bit version of the Intel C++ compiler available which is able to target IA32? It seems that currently only the reverse is supported, i.e. an IA32-binary which produces code for Intel64. Can I somehow convince Intel64/icl.exe to produce code for IA32?

I know that the linux version of the Intel C++ Compiler *does* support that kind of cross-compiling, but that doesn't help since I need to target Windows, not Linux? Unless there's a way to cross-compile on Linux for Windows, of course...

If there's no support for that kind of cross-compiling, are there any compiler flags which I might use to conserve memory, apart from disabling inlining? (My app absolutely depends on inlining for performance. There are a lot of functions which compile to a single SSE instruction). I'm already using /Qip-, which seems to help a bit, but maybe there are others...

best regards,
Florian Pflug

22 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers jimdempseyatthecove

Florian,

I will defer the issue of using a 64-bit compiler to produce a 32-bit app to the Intel readers of this post. (This would not be an unreasonable request.)

I suspect that you have a

  #include "SingleHeaderThatBringsInAllTemplates.h"

Consider fragmenting large templates into smaller functional units then including only the templates necessary for the current .cpp file.

Jim Dempsey

www.quickthreadprogramming.com
Bild des Benutzers Sergey Kostrov
Hi Florian,

Quoting fgp.phlo.org ...I'm trying to build a heavily templated application for IA32 with the Intel C++ Compiler for Windows, version 12.1.5.344, running on 64-bit Windows 7.  Unfortunately, however, the IA32-targetting icl.exe (and mcpcom.exe) seem to be 32-bit binary, and errors out after trying to allocate more than 4GB (which is obviously impossible for a 32-bit binary).

     [SergeyK] That is correct. A regular Win32 application cannot allocate more than 2GB of memory. The best
                        allocation number that I was able to get is ~1.99GB with a MinGW C++ compiler.

                        A non-regular Win32 application that uses Address Windowing Extensions ( AWE / a technology from Microsoft )
                        could allocate greater than 2GB of memory.

Are these erros from Intel C++ compiler or from your application?

Best regards,
Sergey

Bild des Benutzers Sergey Kostrov

It looks like spam continues. Please take a look at a previous post [ on Thu, 09/20/2012 - 22:35 ].

Bild des Benutzers HLW S.

From the Intel C++ compiler. That's why I was looking for compiler which targets IA-32, yet is itself a 64-bit application.

I've by now discovered that running the IA-32 compiler on 64-bit Windows 7 helps a bit. The compiler still can't allocate more than 4GB, of course, but at least it can get about 3.8GB. Probably because there's no need for the kernel address space to lie within the first 4GB of memory if the kernel itself runs in 64-bit mode. (Dunno why it still reserves ~200MB, but my guess is that it's a DMA zone for legacy PCI hardware which cannot address more than 4GB)

Bild des Benutzers Jennifer J. (Intel)

The Intel C++ for ia32 is built with "/LARGEADDRESSAWARE". so it can get close to 4GB on a x64 OS.

It seems your case is very extreme or maybe there is a compiler bug. is your code built successfully with MSVC?

Jennifer

Bild des Benutzers HLW S.

It builds successfully with both GCC and Clang on linux and Mac OS X. It doesn't build with MSVC due to MSVC's poor support for SSE vectors as member variables. Which, BTW, is the reason I turned to Intel's C++ Compiler in the first place.

I've meanwhile managed to get ICC to compile the thing by using explicit template instantiation. My code contains about 30 or so instantiations of the same templated code, adapted via template arguments for slightly different use cases. With the help of explicit template instantiation, some preprocessor magic and and a rather complex build script I now compile each instantiation separately, which drives memory usage down to a couple of hundred MB. The costs is a huge increase in conceptual complexity - keeping track of all required instantiations of these templates manually really isn't fun :-(

On the upside, I can now compile selected parts with Qinline-forceinline enabled, which brings about another 10% performance it seems.

In conclusion, my problem is solved for now, but given how common 64-bit OSes are nowadays, it still seems silly to have to optimize for compiler memory usage. So, @Intel: Please consider making 64-bit builds of your IA-32 targetting compilers available.

Bild des Benutzers Sergey Kostrov

>>...I've by now discovered that running the IA-32 compiler on 64-bit Windows 7 helps a bit. The compiler still can't allocate more than 4GB,
>>of course, but at least it can get about 3.8GB.
.
A Win32 application without Microsoft's AWE can not allocate more than 2GB of memory. This is by design and it simply impossible to allocate 3.8GB for a 32-bit application ( a regular case ). A not regular case means a Microsoft's 32-bit operating system must support AWE and that option is only supported in server editions. I've done lots of testing on 32-bit platforms and a maximum amount of memory my test application was able to allocate is about ~1.9GB. It also depends on a C++ compiler a developer uses and a complexity of a test application.
.
>>...Please consider making 64-bit builds of your IA-32 targetting compilers available.
.
That's a good proposal but I don't think Intel will do it. It was shortly discussed that two different versions of Intel C++ compiler have to be used in order to build 32-bit or 64-bit applications. Microsoft, GCC, MinGW, etc follow the same path. That is, different C++ compilers for different platforms.

Bild des Benutzers Sergey Kostrov

>>...so it can get close to 4GB on a x64 OS.
>>
>>It seems your case is very extreme...
.
I think an extreme case is when an application uses greater than 1TB of memory. Amounts of memory like 4GB or 8GB are no longer considered as unique or extreme. Also, this is a quote from MSDN:
.
...64-bit Windows supports up to 1 terabyte of physical memory with 8 terabytes of address space for each process...

Bild des Benutzers Sergey Kostrov

>>... It doesn't build with MSVC due to MSVC's poor support for SSE vectors as member variables...
.
That looks very strange. Could you provide an isolated example that shows a problem?

Bild des Benutzers HLW S.

Yup, try this


struct T {

  __m128i data;

};

T add(T a, T b) {

  const T result = { _mm_add_epi32(a.data, b.data) };

  return result;

}

MSVC complains that T may not be passed by value, since it requires alignment of > 8 bytes (it requires 16-byte alignment to fullfill __m128i's alignment requirements). No other compiler I tried has the slightest problem with this.

Bild des Benutzers Jennifer J. (Intel)

Zitat:

HLW S. schrieb:

So, @Intel: Please consider making 64-bit builds of your IA-32 targetting compilers available.

This is a big feature request. please file a ticket at Intel Premier Support (https://premier.intel.com/) as well.

Jennifer

Bild des Benutzers Sergey Kostrov

>>...The Intel C++ for ia32 is built with "/LARGEADDRESSAWARE". So it can get close to 4GB on a x64 OS...
.
Even if that option could be used in a 32-bit VS project it does not resolve a problem of 2GB limitation for a regular 32-bit application on
a 32-bit Windows platform that does not support AWE.

Bild des Benutzers Sergey Kostrov

>>struct T {
>> __m128i data;
>>};
.
Please take into account that 'T' is a reserved word and it is used in C++ templates. Thanks for the test-case and I'll try it.
.
Best regards,
Sergey

Bild des Benutzers HLW S.

Zitat:

Sergey Kostrov schrieb:
Even if that option could be used in a 32-bit VS project it does not resolve a problem of 2GB limitation for a regular 32-bit application on
a 32-bit Windows platform that does not support AWE

True, but irrelevant. This is not about arbitrary application, it's about one very specific application, namely the Intel C++ Compiler.

Zitat:

Sergey Kostrov schrieb:
Please take into account that 'T' is a reserved word and it is used in C++ templates. Thanks for the test-case and I'll try it.

That is wrong. 'T' is not, and never was, a reserved word. It's a common name for template parameters, but there's nothing special about it.

I consider my question to be answered. What I hoped for doesn't seem to exists, and I've found a workaround.

Bild des Benutzers Sergey Kostrov

Zitat:

HLW S. schrieb:

Yup, try this


struct T {

  __m128i data;

};

T add(T a, T b) {

  const T result = { _mm_add_epi32(a.data, b.data) };

  return result;

}

MSVC complains that T may not be passed by value, since it requires alignment of > 8 bytes (it requires 16-byte alignment to fullfill __m128i's alignment requirements). No other compiler I tried has the slightest problem with this.


.
There was a declaration error and try this instead:
.
typedef struct tagUserT
{
__m128i m_Data;
} UserT;

const UserT AddUserData( UserT &a, UserT &b );

const UserT AddUserData( UserT &a, UserT &b )
{
UserT ut;

ut.m_Data = _mm_add_epi32( a.m_Data, b.m_Data );

return ( UserT )ut;
}
.
I compiled it with MS C++ compiler of VS 2005.

Bild des Benutzers Sergey Kostrov

>>...No other compiler I tried has the slightest problem with this.
.
What C++ compilers did you try?

Bild des Benutzers Sergey Kostrov

>>...MSVC complains that T may not be passed by value...
.
I had a different compilation error with your unmodified test-case:
.
C2719 - The align __declspec modifier is not permitted on function parameters.

Bild des Benutzers Sergey Kostrov

This is a short follow up. Here is a test case:
.


     ...

     UserT utA = { 1 };

     UserT utB = { 2 };

     UserT utC = { 0 };

     ...

     utC = AddUserData( utA, utB );

     ...


.
Verified with Intel, Microsoft and MinGW C/C++ compilers.

Bild des Benutzers HLW S.

Zitat:

Sergey Kostrov schrieb:
const UserT AddUserData( UserT &a, UserT &b );

Yup, if you pass by reference it works. The point is that it MSVC complains if you pass by value.

Thanks for your tests, though.

Bild des Benutzers Jennifer J. (Intel)

Hello Florian Pflug,
I've sent you a private msg regarding the request for native x64 compiler for IA32 app, please respond.
Or if you could submit it to Intel Premier Support, it would be great. Just let me know the ticket number.

thank you.
Jennifer

Bild des Benutzers Jennifer J. (Intel)

Zitat:

fgp.phlo.org schrieb:

errors out after trying to allocate more than 4GB (which is obviously impossible for a 32-bit binary).

Florian Pflug

There is a big improvement in the memory usage under /Qipo in the next major release. if you're interested in our beta program, let me know by private post.

For the feature of cross-compiler for generating IA32 code from a x64 compiler is not granted, I've not got a valid testcase for it so far.

Jennifer

Melden Sie sich an, um einen Kommentar zu hinterlassen.