Intel C++ compiler and Visual C++

Intel C++ compiler and Visual C++

SayI use Visual C++ to develop a C++/CLI application where I mix native C++ code with code compiled with the /clr switch.

Can I still benefit from complementing Visual C++ with the Intelcompiler?

Is it even possible or must the whole application be native C++ touse the Intel compiler?

Thank you.

21 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Yes, you can use the Intel C++ Compiler for the native part of the program. Because IntelC is compatible with VC. With this combined solution, you'll be the best performance in the native.

So in that situation the compilers will take turns duringthe compilation? (Well, I would rather make the whole application native but in this way I can make use of the new WPF GUI).

I'm aiming at Visual Studio 2008. When can a corresponding Intel compiler be expected?

I forgot to ask whether this mixed compiler set-up will result in some limitation I should know about?

There's no limitation with the mix-match. But if you use OpenMP code, you should wait for the 10.1 compiler which will be releasing very shortly.

About VS2008 support, if you need the IDE integration module, you need to wait a little longer. The command line support will come soon in 10.1.

Please check back and I'll post a news here when 10.1 is available.

I should say that the Intel C++ Compiler 10.0 supports OpenMP as well. But you should use only one OpenMP RTL.

Well, I guess Visual Studio 2008 has to arrive first doesn't it. -:)

Anyway if I get it right,onlythe professional version of Visual Studio 2005supports OpenMP?Can I use someversion of Visual Studio that doesn'tsport OpenMP and then instead add OpenMP support by the way of the Intel compiler?

I haven't bothered much before but when usingVisual StudioI get code that willrun on both AMD and Intel processors. This meansthe generated code is restricted to some common denominator instruction set. I guess at least someaggressive optimizations involves using specific Intel only instructions that are not available on AMD. If I don't want this because I want one executable for both processor types will then some or many or maybe evenmost of the advantage of using the Intel compiler get lost and there will be no realadvantage left over the Visual Studio compiler? In short, will any meaningful usage of the Intel compiler render the resulting executables Intel only?

Hello,

You can easily control what IA-32 processors you want to support with the /Qx compiler switch. In many cases the /QxW will be sufficient, and generate code that runs on most Intel and AMD processors. I suspect that the Intel Compiler is so good that it actually may partly be resposnsible for the excellent SPEC2006 results, and I would not be surprised if the SPEC2006 results could be somewhat improvedfor AMD if they used the Intel C++ Compiler.....

:-)

Best Regards,

Lars Petter Endresen

I understand the latest MSVC++ is available for download on msdn2.microsoft.com. It makes the "blend" model CPU architecture the default. "blend" is suited for AMD Barcelona and is not badly mis-matched with Intel CPUs. The OpenMP of that compiler works with the Intel 10.1 compiler compatibility library, for which Intel claims improved performance.

Exactly!

In Visual Studio 2008 the "blend" model is the only choice - the /arch:SSEx options are disabled altogether. Thus, to take advantage of SSEx instructions without writing every SSEx instruction yourself, it is recommended to use Intel C++ Compiler with Visual Studio 2008.

Best Regards,

Lars Petter Endresen

http://blogs.msdn.com/vcblog/archive/2007/10/18/new-intrinsic-support-in-visual-studio-2008.aspx

"...Access to the SSE3 or SSE4 instructions is provided, but the compiler does not provide any mechanism to automatically take advantage of new instructions (as was done with previous /arch:SSEx switches)..."

Well, I know it depends on the application but is complementing Visual C++ withIntel C++ likely to result in a faster programin "blend" mode? My application isn't written yet but there will be lots of floating point number crunching, multithreading to utilize multiple cores and DirectX 10 graphics.

I'm just trying to get an approximate feeling forwhether the usage of Intel C++ is likely to improve the performance of my program.I guess it's also relevant whetherI use the Intel libraries (like Math Kernel and Threading Building Blocks). Maybethere's some benchmark available that can guide me?

Hello,

In many cases "blend" mode may be too defensive, you may not be able to take advantage of any instructions after the Pentium. Thus, I would instead suggest to make a software with a couple of DLL's for each instruction set, typically, x86 ("blend"), SSE2 and SSE4. Then you can load the proper DLL runtime using CPUID instruction. In this way you can take maximum advantage of the HW present simultaneously as the code always works on all computers.

In some cases, an application that uses SSE2 can be up to 10x faster than "blend". See this recent thread, where the SVML code (using SSE2) actually is 6x faster than the "blend" code. A similar case is seen here.

Best Regards,

Lars Petter Endresen

I'm trying to get my own benchmarks run with the new Microsoft in addition to the current Intel compiler. Unfortunately, the benchmarks are based on open source code but not with GPL license. In the absence of any statement from the authors or their employers, I haven't got permission to post them.
Intel claims better OpenMP performance when using the ICL 10.1 library, even when compiling with VC++ 8.0. MKL 10.0 also will support the Microsoft compatible OpenMP libraries.
Claims for improved performance of VC++ 8 seem aimed more at past Intel machines, such as Pentium D, than at current Core CPUs. "blend" avoids the worst problems of the current default /favor:AMD switch, which is bad for Barcelona as well as Intel. If there is no vectorization, ICL would still have a big advantage where that applies.
You may want to observe that MSVC defaults to /fp-model:precise (which should always observe C and C++ standard), while ICL defaults to /fp-model:fast.
I'm trying to read Microsoft web pages, but they are nearly illegible in FireFox.

@Lars Petter Endresen

Thank you for your help butmy main concern right now is trying to establish under what circumstances it pays off to complement Visual C++ with Intel C++?

If I want togenerate code for Intel instruction sets not supported by Visual C++ then of course I must have the Intel compiler.

But say I just want to support the "blend" instruction set which I take to mean the smallest common denominator of very modern (later than 2006) AMD and Intel processors. Does it still pay off to add on the Intel compiler or will Visual C++alone do fine?

> (later than 2006) AMD and Intel processors

Hello,

Thank you for being more specific about what kind of processors you intend to support. I wonder if there are any processors after 2006 that are not supporting SSE2? According to Wikipedia, this does not seem to be the case, all processors after 2006 (bothIntel and AMD) support SSE2.Then it would make sense to require SSE2, as the "blend" option usually means compatibility back to Intel Pentiumthat was introduced in 1993.If you can require SSE2, then Intel C++ may usually be the best choice, in particular in applications that are computationally demanding.

The "blend" option means x87 floating point instructions, see this link: "Visual Studio defaults to use x87 for floating-point math. You should use at least the /arch:SSE2 option for Athlon 64 processors and later." However, the drawback with the /arch:SSE2 option in Visual Studiois that this does not lead to any vector instructions, it uses the scalar SSE2 instructions only. Intel C++ on the other hand, generates code that runs very well on any CPU released after 2006, taking advantage of the vector SSE2 instructions, simply byenabling SSE2 with the /QxW compiler switch.

In addition to instruction set differences, there are a number of other differences also, like memcpy, openmp and so on. My experience is that it may be recommended to use both Intel VTune, Intel C++ and Microsoft C++ in Visual Studio to obtain the required performance. With VTune one can easily determine which parts of an application that must be tuned and compiled with Intel C++.

Best Regards,

Lars Petter Endresen

According to my understanding, we were discussing VS2008 beta, where the /arch switch has been removed. A non-Microsoft compiler is still required to support auto-vectorization. I don't expect VS2008 to support my Pentium-M well, either for vectorizable or non-vectorizable code, so it seems aimed at a fairly narrow range.
With Intel C++ from 10.0, an architecture switch is not required to get auto-vectorization in the 64-bit compiler (/QxW is the default), so you can expect a performance increase over Microsoft without changing command line switch.

I've come to this conclussion:

In an application with lots of floating point calculations you would want the sse2 (and even sse3) extension for top performance. This is also what recent AMD and Intelprocessorssupport (after 2006) so it's a reasonable "blend" level. Intel C++ makes better automatic use of this extension so it's worthwhile to complement Visual C++ with it.

Is this correct?

When it comes to my multithreading needs they seem to be best covered by the Threaded Building Blocks which doesn't require any special compiler.

Thank you allfor your patience!

I believe SSE3 has the most benefit in vectorized C99 complex applications. As Microsoft C 9.0 hasn't adopted auto-vectorization, there isn't sufficient reason for it to include an SSE3 code generation option.
Current Intel compilers generate SSE3 in some situations where it is not beneficial on Intel CPUs, so that bit may change in future ICL.
Microsoft has given OpenMP a big vote of confidence by adding support for it in their next compiler. As Jennifer mentioned earlier, ICL 10.1 includes an OpenMP library with the new Microsoft calling conventions. So, you have a choice between TBB and OpenMP, according to your requirements.

Here's what I found by working with VC9 express beta (changing the paths in the Intel compiler .bat and .cfg files to use the VC9 equivalents)
No libc support, only libcmt. The 32-bit Intel compilers must be used with options which imply MT (such as /Qopenmp), in order to do mixed Intel/MS builds.
A few major performance changes from VC8 (referring to Levine-Callahan-Dongarra netlib vectors benchmark, translated to C or C++):
s116 transform(...multiplies()) improved to 30% of ICL performance
s161 improved to 70% of ICL performance
s318, s3113 max abs value, improved, still way below ICL performance
s431 transform(...plus()) improved to 40% of ICL performance
vpv improved to 50% of ICL performance
vtv improved to 80% of ICL performance

The following cases show a big loss due to switch from x87 on VC8 to SSE2 non-vector code in VC9:
s176, s313, inner_product()
s255 recursive sum
s312 accumulate(...multiplies())
s3112 partial_sum()
s4115, s4116 indexed sum
vsumr accumulate, vdotr inner_product
The STL inner_product and accumulate are vectorized to excellent effect by ICL, but SSE2 without vectorization is slow for these operations, compared to x87.

I doubt VC9 was intended for the intensive vectorizable float data type operations. It has improved in some instances where performance of VC8 was inferior to ICL and gcc, but it didn't catch up to vectorized performance. For those who prefer SSE2 to x87 code regardless of performance, it does look like a gain.

Intel C++ 10.1 is available for downloading now. It supports VS2008 beta2 from command line only. You need to manually update the icvars.bat and icl.cfg to work with VS2008.

See this posting http://software.intel.com/en-us/forums//topic/56129for more features of 10.1.

"In an application with lots of floating point calculations you would want the sse2 (and even sse3) extension for top performance. "

Hello,

Not only floating point but also integer benefits from SSE2 and later SSEx - Intel C++ Compiler may be a particularly good choice for loops involving short integer types like unsigned char and unsigned short.

Best Regards,

Lars Petter Endresen

Leave a Comment

Please sign in to add a comment. Not a member? Join today