Intel® C and C++ Compilers

Leadership application performance

  • Rich set of components to efficiently implement higher-level, task-based parallelism
  • Future-proof applications to tap multicore and many-core power
  • Compatible with multiple compilers and portable to various operating systems

Performance without compromise

  • Industry leading performance on Intel and compatible processors.
  • Extensive optimizations for the latest Intel processors, including Intel® Xeon Phi™ coprocessor
  • Scale forward with support multi-core, manycore and multiprocessor systems with OpenMP, automatic parallelism, and Intel Xeon Phi coprocessor support
  • Patented automatic CPU dispatch feature gets you code optimized for the current running processor runs code optimized for specified processors identified at application runtime.
  • Intel® Performance Guide provides suggestions for improving performance in your Windows* applications.

Broad support for current and previous C and C++ standards, plus popular extensions

  • Language support with full C++11 and most C99 support. For details on C++11, see http://software.intel.com/en-us/articles/c0x-features-supported-by-intel-c-compiler
  • Extensive OpenMP 4.0* support

Faster, more scalable applications with advanced parallel models and libraries

Intel provides a variety of scalable, easy to use parallel models. These highly abstracted models and libraries simplify adding both task and vector parallelism. The end result is faster, more scalable applications running on multi-core and manycore architectures.

Intel® Cilk™ Plus (included with Intel C++ compiler)

  • Simplifies adding parallelism for performance with only three keywords
  • Scale for the future with runtime system operates smoothly on systems with hundreds of cores.
  • Vectorized and threaded for highest performance on all Intel and compatible processors
  • Click here for sample code, contributed libraries, open specifications and other information from the Cilk Plus community.
  • Included with Intel C++ compiler and available in GCC 4.9 development branch (with –fcilkplus and the caveat that Cilk_for is not supported yet in a Clang*/LLVM* project at http://cilkplus.github.io/.
  • More information

OpenMP 4.0 (included with Intel C++ compiler)

  • Support for most of the new features in the OpenMP* 4.0 API Specification (user-defined reductions not yet supported)
  • Support for C, C++, and Fortran OpenMP programs on Windows*, Linux*, and OS X*
  • Complete support for industry-standard OpenMP pragmas and directives in the OpenMP 3.1 API Specification
  • Intel-specific extensions to optimize performance and verify intended functionality
  • Intel compiler OpenMP libraries are object-level compatible with Microsoft Visual C++* on Windows and GCC on Linux*

Intel® Math Kernel Library

  • Vectorized and threaded for highest performance using de facto standard APIs for simple code integration
  • C, C++ and Fortran compiler-compatible with royalty-free licensing for low cost deployment
  • More information

Intel® Integrated Performance Primitives

  • Performance: Pre-optimized building blocks for compute-intensive tasks
  • A consistent set of APIs that support multiple operating systems and architectures
    • Windows*, Linux*, Android*, and OS X*
    • Intel® Quark™, Intel® Atom™, Intel® Core™, Intel® Xeon®, and Intel® Xeon Phi™ processors
  • More information

Intel® Threading Building Blocks

  • Rich set of components to efficiently implement higher-level, task-based parallelism
  • Compatible with multiple compilers and portable to various operating systems
  • More information

Intel® Media SDK 2014 for Clients

  • A cross-platform API for developing consumer and professional media applications.
  • Intel® Quick Sync Video: Hardware-accelerated video encoding, decoding, and transcoding.
  • Development Efficiency: Code once now and see it work on tomorrow's platforms.
  • More information

A drop-in addition for C and C++ development

  • Windows*
    • Develop, build, debug and run from the familiar Visual Studio IDE
    • Works with Microsoft Visual Studio* 2008, 2010, 2012 and 2013
    • Source and binary compatible with Visual C++*
  • Linux*
    • Develop, build, debug and run using Eclipse* IDE interface or command line
    • Source and binary compatible with GCC
  • OS X*
    • Develop, build, debug and run from the familiar Xcode* IDE
    • Works with Xcode 4.6, 5.0 and 5.1
    • Source and binary compatible with LLVM-GCC and Clang* tool chains
  • 32-bit and 64-bit development included

  1. Project and source in Visual Studio
  2. C/C++ aware text editor
  3. Debug C/C++ code
  4. Call Stack information
  5. Set breakpoints at certain source lines on IDE.

Outstanding support

One year of support included with purchase – gives you access to all product updates and new versions released in the support period plus access to Intel Premier Support. There's a very active user forum for help from experienced users and Intel engineers

  • Videos on Getting Started with Intel® C++ Compiler
  • Vectorization Essentials
  • Performance Essentials with OpenMP 4.0 Vectorization
  • View slides

Register for future Webinars


Previously recorded Webinars:

  • Update Now: What’s New in Intel® Compilers and Libraries
  • Performance essentials using OpenMP* 4.0 vectorization with C/C++
  • Intel® Cilk™ Plus Array Notation - Technology and Case Study Beta
  • OpenMP 4.0 for SIMD and Affinity Features with Intel® Xeon® Processors and Intel® Xeon Phi™ Coprocessor
  • Introduction to Vectorization using Intel® Cilk™ Plus Extensions
  • Optimizing and Compilation for Intel® Xeon Phi™ Coprocessor

More Tech Articles

Mapping of Intel® MPI Library versions to bundle suites
By Gergana Slavova (Intel)Posted 08/28/20140
Introduction: Mapping the Intel® MPI Library numbers to specific suites and update versions Intel® Parallel Studio XE 2015 Update 1 Cluster Edition (released 26 November 2014) Intel® MPI Library 5.0 Intel® Registration Center Activation Date (yr.mo.day) Windows Version / build Linu...
Getting Started with Intel® Integrated Native Developer Experience
By Egor C.Posted 07/18/20140
Intel® Integrated Native Developer Experience (Intel® INDE) is a powerful cross-platform framework for creating applications for Android* and Windows* devices. Intel INDE can be integrated into popular IDEs and provides a complete and consistent set of C++/Java* tools, libraries, and samples for ...
How to use Intel Cilk Plus extension, to speed up your Android application with threading features
By shenghong-geng (Intel)Posted 05/28/20140
Intel Cilk Plus is an important language extension of Intel Compiler, to help you implement multiple-threading easily and fast, to improve your application's performance on multi-core systems. While more and more cores on Android devices, it is also more important to efficiently use the multi-cor...
Selective Use of gatherhint/scatterhint Instructions
By Rakesh Krishnaiyer (Intel)Posted 02/20/20140
Compiler Methodology for Intel® MIC Architecture Selective Use of gatherhint/scatterhint Instructions Overview The -qopt-gather-scatter-unroll=<N> compiler option can be used to generate gatherhint/scatterhint instructions supported by the coprocessor.  This is useful if your code is doi...
Subscribe to Intel Developer Zone Articles

Supplemental Documentation

No Content Found
Subscribe to Intel Developer Zone Articles

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


Optimization problems with std::array maybe due to RVO
By velvia2
Hi, I discovered a source of slowdown in my program, due to the usage of std::array. To gain a better understanding of what was going on, I have used my own implementation of std::array, and the slowdown disappeared. Unfortunately, I can't show you the program I am working on due to some non disclosure agreement. But I've managed to track down the problem and put it in a simple file. When you compile the file shown below with icpc -c -std=c++11 -Ofast -xHost -ansi-alias -qopt-report=2 ode.cpp -o ode.oand look at the optimization report, the generated function is ode(StaticVector<double, 2UL> *, const StaticVector<double, 2UL> &)But if you comment out what is in between the "culprit" comments, all the constructors and assignements, the function does have a new signature ode(const StaticVector<double, 2UL> &)My guess is that it has to do with return value optimization. Could you please explain me why we have such a difference? Best regards, Francois #include ...
Right Intel C++ compiler for 64-bit machine
By polispip4
Hello, I just installed the Intel® Parallel Studio XE Professional Edition for C++ Linux* (evaluation version) for my Ubuntu 14.04.2 machine running on a "Intel Corporation Xeon E3-1200 Processor Family" processor. I don't understand if this is a 32-bit compiler or not. I'm worried about this since during the installation a message reporting the absence of some 32-bit libraries appears on the console.  Since I'm interested to have the best performances (running time) of my code, does a 64-bit specific version exist for my machine? Thank you
error: class "__m128" has no suitable assignment operator
By Matt S.3
This code  #include <xmmintrin.h>   volatile __m128 a, b;   void test(void) {      a = b; }   produces this error $ /opt/intel/composerxe/bin/icpc -c test.cc test.cc(7): error: class "__m128" has no suitable assignment operator        a = b;            ^   compilation aborted for test.cc (code 2)   when compiled with icpc.  There is no error if the variables are not volatile.  There is no error with icc or gcc or g++. Any suggestion on how to compile it with icpc?    
'Segmentation violation signal raised' when xiar runs
By qpalz0
I am trying to compile chromium 41.0.2272.64 (64 bit) using icc on Linux. icpc --version shows icpc (ICC) 15.0.2 20150121 Copyright (C) 1985-2015 Intel Corporation.  All rights reserved. I compile the whole thing with -ipo option on. It compiles obj/content/browser/gpu/content_browser.gpu_process_host.o as usual with the following command (some unrelated stuff is skipped, and I know that some options shown below are not supported by icc or duplicated): icpc ... -fstack-protector --param=ssp-buffer-size=4 -pthread -fno-strict-aliasing -Wall -Wno-unused-parameter -Wno-missing-field-initializers -fvisibility=hidden -pipe -fPIC -Wno-unused-local-typedefs -pthread ... -m64 -march=x86-64 -O2 -fno-ident -fdata-sections -ffunction-sections -funwind-tables -O2 -march=native -ipo -no-prec-div -ansi-alias -parallel -fno-exceptions -fno-rtti -fno-threadsafe-statics -fvisibility-inlines-hidden -Wsign-compare -std=gnu++11 -Wno-narrowing -Wno-literal-suffix -c ../../content/browser/gpu/gpu_p...
Problem compiling with armadillo
By Daniel H1
Hello all, I'm using aramadillo (http://arma.sourceforge.net) to elegantly manipulate arrays. All was working well till the last version (4.650.2). Now it fails compiling with icpc (15.0.1, Linux) pretending there is a resolution problem.  The snippet code still compile fine with g++(4.9.2) and also clang (3.5) showing no error or warning. I've filed a bug to armadillo team, but they told me it is an Intel issue. I would be happy if any solution exists to solve this issue as I both need using armadillo and the intel compiler for speed and efficency. Thanks in advance for any answer. Daniel Here is the code: #define ARMA_DONT_USE_WRAPPER #define ARMA_DONT_USE_HDF5 #define ARMA_DONT_USE_BLAS #include <armadillo> using namespace std; using namespace arma; #define LEN 50 int main() { mat::fixed<LEN,9> beta; vec::fixed<LEN> alpha; mat::fixed<LEN,9> ash1; ash1=repmat(alpha,1,9)-beta; return(0); }and the compilation error: ~ $ icpc -I armadillo-4.650.2/i...
A possible bug found in ICC compiler with inline ASM
By mengke e.5
I found a bug when using the inline ASM of Intel Parallel Studio XE 2015 Update 2 Composer Edition for C++ Windows. Since I'm not very familiar with inline ASM, I'm not sure if it is a bug. #include <iostream> using namespace std; __forceinline void DoNothingWithMemory( float*const copyByValue ) { float* copyByValueAgain = copyByValue ; /* As you see, the two var in this function, "copyByValueAgain" and "copyByValue", are copied by value. Therefore, even if the code block of inline asm below changes one of them, there's nothing to do with the var "p" in the main function. However, the fact is, the var "p" in the main function IS changed after executing the code block of inline asm below! */ __asm__ __volatile__( "lea 4(%0),%0;" //In fact, nothing is done here. It's just a "lea", and has nothing to do with memory or pointer aliasing! : :"r"( copyByValueAgain ) : ); } int main() { float a; //just a place hol...
"internal error: backend signal" when compilng "DRMAA for PBS"
By Eric R.5
Hello, On two separate systems I've attempted to compile "DRMAA for PBS" (found here). It successfully compiles with gcc and fails with icc resulting in the "interal error: backend signal". From what I can tell from searching in previous forums that error message is highly problematic and should be reported. This error has occurred on the 15.0.1 version on the Intel Compiler. It was compiled under linux. The build was a simple "configure" followed by a "make". The problematic file and resulting error is displayed below: icc -DHAVE_CONFIG_H -I. -I.. -I/opt/torque/include -I/opt/torque/include/torque -I../drmaa_utils -fPIC -D_REENTRANT -D_THREAD_SAFE -DNDEBUG -D_GNU_SOURCE -DCONFDIR=~/opt/pbs-drmaa-1.0.18/etc -Wall -W -Wno-unused-parameter -Wno-format-zero-length -pedantic -ansi -g -O2 -pthread -MT drmaa.lo -MD -MP -MF .deps/drmaa.Tpo -c drmaa.c  -fPIC -DPIC -o .libs/drmaa.o (0): internal error: backend signals Thanks for your time in advance, - Eric
[OS X] a tuple of tuples
By t.ueshiba@aist.go.jp1
The following code contains a std::tuple of tuples. It can be successfully compiled with c++ compiler provided by Apple but fails with icpc-15.0.2.132 under OS X 10.10(Yosemite). #include <tuple>   int main() {     using namespace    std;       tuple<tuple<int, float>, tuple<long, double> > x;       return 0; }
Subscribe to Forums

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


Timing and benchmarking reproducebility
By erling_andersen0
This is not really a cilk question but I will try anyway because I think you INTEL must have a lot experience.  I have read the section https://software.intel.com/en-us/node/522641 but it is a bit vague. How do you get reliable timing results when benchmarking cilk programs? Do you use a particular OS in a particular setup for instance? My experience on Windows is times can varies a lot for even for single threaded runs when the same program is run different points times. A 10% difference can easily be measurement error. Our experience seems to indicate Linux is not much better. Btw I have disabled hyperthreading and is using a server and not a laptop. I am only user and shut down unneeded applications before running my test. Maybe I should compute averages and variance of run times and apply statistical tests to the results. And use that to conclude about performance.  
cilkview and gcc 4.9 cilkplus branch on 64 bit linux
By leoferres1
Hello, I've downloaded the latest (I think) version fo cilkview from the cilkplus website. For some reason it is not working (even if I try to run it from the same folder, it doesn't find the program...) I'm thinking it has to do with some 32 vs 64-bit library. I exported the lib64 to no avail. Is there any kind of support for cilkview? If not, will you guys just release the source, so we can patch it? This would be great. Same with cilkprof. Thanks for your time.
_cilkrts_hyperobject_dealloc is expensive
By erling_andersen12
Intel Vtune shows that  _cilkrts_hyperobject_dealloc is a very expensive operation. Is it because I spawning too much? I have fairly deep recursion so the stack grows quite large. Maybe that is the issue.      
Build info
By erling_andersen4
I am newcomer to cilk but like it. Now I have been looking for build and linking instructions e.g which libs should I link with. Now the section https://software.intel.com/en-us/node/522585 is called  Build, Run and Debug an Intel(R) Cilk(TM) Plus Program   but it has NO info about building.  
Is it possible to me to improve a simple O(n) algorithm with Cilk Plus?
By Roní G.2
Hi everybody! I have a simple algorithm to print the very first unique ASCII character from a stream. In the worst case the algorithm scans through the stream twice, doing some comparisons and increments. So, roughly, my algorithm is O(n). The algorithm0 do two main things: It scans all the characters through the stream, counting the number of their occurrences with the help of an array. The character itself is used as the index of this array, since ASCII characters are numbers from 0 to 127 (or 255, if we consider its extended version); After having filled the array that relates the characters in the stream with how many times they have appeared in it, the algorithm scans through the array, again, but this time it checks if the number of occurrence is equal to 1: if it is, then this is the first unique character; if not, it continues the search until the end of the stream is reached. The first task runs1 in linear time: a1n + b1. So as the second task: a2n + b2. Roughly, the whol...
vectorizing with an inline function?
By rnickb1
I attached two code files mandel1.cpp and mandel2.cpp. mandel1.cpp has a loop with all the code in the body mandel2.cpp has equivalent code but instead of having the code in the body it calls an inline function Compiling with intel c++ compiler 15 with "icc  -O3 -fp-model fast=2 -xCORE-AVX2 -fma -c -S", I can vectorize mandel1.cpp but not mandel2.cpp. Is there I way I can vectorize mandel2.cpp and still have a separate function? It seems like the optimizer ought to just be able to inline and then apply the vectorization if it can vectorize mandel1.cpp. I tried using the "vector" attribute, but it doesn't look like it works with struct/class arguments.
How to compile cilk plus runtime source with Intel® C++ Composer XE 2013
By Yaqiong P.1
Dear all, I want to compile cilk plus runtime source with Intel® C++ Composer XE 2013. I build the cilk plus runtime according to the directions in the "readme" file (libtoolize; aclocal; automake --add-missing; autoconf; ./configure; make; make install). But in this way, gcc is used by default. Please, could somebody give me some guidelines in order to compile cilk plus runtime source with Intel® C++ Composer XE 2013?  Thanks a lot for your help. Best Regards, Yaqiong Peng
cilk_spawn skips function calls (?)
By Christos T.7
Hello there. I'm a student and i'm trying some experiments with CilkPlus of icc 15. I'm using Ubuntu 12.04 with x64 Intel Processor. The following code is an implementation of a radix sorting algorithm of an octree using points' morton codes. The problem is that it seems that though cilk decides not to spawn a new thread in one of the 8 recursive calls, it also skips calling the function serially. This results in producing a non-complete sorted index vector, whose size is less than the original index vector's size and thus it doesn't apply sorting to all points. This is not happening if i implement serially the bin splitting and i apply cilk_for only to the recursive calls. Can you explain to me what's happening? Is there an alternative implementation? Should i correct something? #include <cstdlib> #include <cstdio> #include <vector> #include <cilk/cilk.h> #include <cilk/reducer_vector.h> #define MAXBINS 8 typedef std::vec...
Subscribe to Forums