First, let me begin by saying that compiler switches are mainly targeted for specific processors (not OS's.) With the recent launch of the 3rd Generation Intel Core Processors (code-named Ivy Bridge), developers may be looking for information on any new switches that might have been developed to specifically support (and run the best on) our latest architecture. This blog will include the following:
- Getting to "Optimal Performance"
- What is new for the Intel 3rd Generation Core Processor Family
- Compiling for the Intel 2nd Generation Core Processor Family
- More Resources (URLs will be included at the end of the blog)
Getting to "Optimal Performance"
If you are new to performance optimization, there are different levels of tuning you can do. The most basic performance optimization happens with testing your software with the use of different compiler switches. If you are happy with the performance then you can stop there. Some applications will require finer tuning (possibly requiring code changes and more complicated use of compiler intrinsics.) To get to this level of tuning you need to be able to characterize your application - is it memory intensive, CPU intensive, I/O intensive, graphics intensive and where are the “Hot Spots?”)
Many applications do have these so-called "Hot Spots" and in order to obtain peak performance, developers need to find them so that they can decide how to optimize their code. We find that there may not be a "generic" overall set of compiler switches that are best for the overall performance - what is considered to be a great option for some portions of an app may actually hinder other parts due to the nature of an application's particular characteristics.
While this blog will not cover how to characterize your application, you can take a look at what Intel offers for software profilers:
- Intel VTune Amplifier XE: a powerful threading and performance optimization tool for C/C++, .NET, and Fortran developers who need to understand an application's serial and parallel behavior to improve performance and scalability.
- Intel GPA: a set of powerful graphics and gaming analysis tools that are designed to work the way game developers do, saving valuable optimization time by quickly providing actionable data to help developers find performance opportunities from the system level down to the individual draw call.
- Intel Trace Analyzer and Crollector: A powerful cluster profiler for understanding MPI application behavior, quickly finding bottlenecks and achieving high performance for parallel cluster applications. It supports Intel architecture-based cluster systems, is compatible with current standards, and includes trace file comparison, counter data displays and an MPI correctness checking library.
Compiling for the Intel 3rd Generation Core Processor Family
Mostly, optimizing for the 3rd Generation Core Processor Family is the same as optimizing for the 2nd Generation Core Processor Family (code-named Sandy Bridge.)
- The instruction set differences are very small unless you need auto-generated float16 to float32 conversions which is very specialized.
- There are no new header files; intrinsics are still in immintrin.h (includes RDRAND.)
- AVX optimizations: From a micro-architecture point of view, we talk about optimizing for Intel AVX, rather than for a particular processor.
- You can download the Intel Compilers here:
The new Compiler Suites will install on the latest versions of Windows* and integrate with Visual Studio 2012 (Professional Addition and above), however they currently only support developing applications for the Desktop. The compatible switch for Visual Studio 2012 is /Qvc11. For Visual Studio 2012 Express edition, the command line support is still available. URLs are included below providing further information.
New with 3rd Generation Core Processors:
- /QxCORE-AVX-I (If you need autogeneration of float16 to float32 conversions)
- New RDRAND instructions for generating random numbers of 16/32/64 bit wide integers
- Two new instructions VCVTPS2PH and VCVTPH2PS for performing 16-bit floating-point data type conversion to and from single-precision floating-point data type.
- New instructions RDFSBASE, RDGSBASE, WRFSBASE, and WRGSBASE for program to read/write the FS base and GS base registers.
- New processor targeting cpuid “CORE_3RD_GEN_AVX” for use with manual processor dispatch in __declspec(cpu_dispatch(CORE_3RD_GEN_AVX))
2nd Generation Core Processor
- /Qvc11 (Compatibility flag for Visual Studio 2012; for IDE integration, Visual Studio 2012 Professional Edition and above is required)
- /QxAVX (also works great on 3rd Generation Core Processors) Final binary runs on both 2nd and 3rd Generation Core Processors.
Finding an app's optimal performance can mean different things to different developers (for different apps.) For some apps, simply experimenting with certain compiler switches is all that is needed or desired. Other apps may require a deeper understanding of an app's performance and the associated bottlenecks. For those apps, this blog also provides resources to the Intel profiling tools. The primary focus of this blog was to introduce the new compiler switches/instructions available for the Intel 3rd Generation Core Processors and to point out that the optimizations for the Intel 2nd Generation Core Processor Family are still highly relevant.
- How to Compile for Intel® AVX
- Which applications are most likely to benefit from recompilation for Intel® Advanced Vector Extensions (Intel® AVX)?
- Intel® Compiler Options for Intel® SSE and Intel® AVX generation (SSE2, SSE3, SSE3_ATOM, SSSE3, SSE4.1, SSE4.2, AVX, AVX2) and processor-specific optimizations
- Quick-Reference Guide to Optimization with Intel® Compilers version 12 For IA-32 processors and Intel® 64 processors
- Intel® 64 and IA-32 Architectures Optimization Reference Manual