Intel® C and C++ Compilers

Leadership application performance

  • Rich set of components to efficiently implement higher-level, task-based parallelism
  • Future-proof applications to tap multicore and many-core power
  • Compatible with multiple compilers and portable to various operating systems

Performance without compromise

  • Industry leading performance on Intel and compatible processors.
  • Extensive optimizations for the latest Intel processors, including Intel® Xeon Phi™ coprocessor
  • Scale forward with support multi-core, manycore and multiprocessor systems with OpenMP, automatic parallelism, and Intel Xeon Phi coprocessor support
  • Patented automatic CPU dispatch feature gets you code optimized for the current running processor runs code optimized for specified processors identified at application runtime.
  • Intel® Performance Guide provides suggestions for improving performance in your Windows* applications.

Broad support for current and previous C and C++ standards, plus popular extensions

  • Language support with full C++11 and most C99 support. For details on C++11, see
  • Extensive OpenMP 4.0* support

Faster, more scalable applications with advanced parallel models and libraries

Intel provides a variety of scalable, easy to use parallel models. These highly abstracted models and libraries simplify adding both task and vector parallelism. The end result is faster, more scalable applications running on multi-core and manycore architectures.

Intel® Cilk™ Plus (included with Intel C++ compiler)

  • Simplifies adding parallelism for performance with only three keywords
  • Scale for the future with runtime system operates smoothly on systems with hundreds of cores.
  • Vectorized and threaded for highest performance on all Intel and compatible processors
  • Click here for sample code, contributed libraries, open specifications and other information from the Cilk Plus community.
  • Included with Intel C++ compiler and available in GCC 4.9 development branch (with –fcilkplus and the caveat that Cilk_for is not supported yet in a Clang*/LLVM* project at
  • More information

OpenMP 4.0 (included with Intel C++ compiler)

  • Support for most of the new features in the OpenMP* 4.0 API Specification (user-defined reductions not yet supported)
  • Support for C, C++, and Fortran OpenMP programs on Windows*, Linux*, and OS X*
  • Complete support for industry-standard OpenMP pragmas and directives in the OpenMP 3.1 API Specification
  • Intel-specific extensions to optimize performance and verify intended functionality
  • Intel compiler OpenMP libraries are object-level compatible with Microsoft Visual C++* on Windows and GCC on Linux*

Intel® Math Kernel Library

  • Vectorized and threaded for highest performance using de facto standard APIs for simple code integration
  • C, C++ and Fortran compiler-compatible with royalty-free licensing for low cost deployment
  • More information

Intel® Integrated Performance Primitives

  • Performance: Pre-optimized building blocks for compute-intensive tasks
  • A consistent set of APIs that support multiple operating systems and architectures
    • Windows*, Linux*, Android*, and OS X*
    • Intel® Quark™, Intel® Atom™, Intel® Core™, Intel® Xeon®, and Intel® Xeon Phi™ processors
  • More information

Intel® Threading Building Blocks

  • Rich set of components to efficiently implement higher-level, task-based parallelism
  • Compatible with multiple compilers and portable to various operating systems
  • More information

Intel® Media SDK 2014 for Clients

  • A cross-platform API for developing consumer and professional media applications.
  • Intel® Quick Sync Video: Hardware-accelerated video encoding, decoding, and transcoding.
  • Development Efficiency: Code once now and see it work on tomorrow's platforms.
  • More information

A drop-in addition for C and C++ development

  • Windows*
    • Develop, build, debug and run from the familiar Visual Studio IDE
    • Works with Microsoft Visual Studio* 2008, 2010, 2012 and 2013
    • Source and binary compatible with Visual C++*
  • Linux*
    • Develop, build, debug and run using Eclipse* IDE interface or command line
    • Source and binary compatible with GCC
  • OS X*
    • Develop, build, debug and run from the familiar Xcode* IDE
    • Works with Xcode 4.6, 5.0 and 5.1
    • Source and binary compatible with LLVM-GCC and Clang* tool chains
  • 32-bit and 64-bit development included

  1. Project and source in Visual Studio
  2. C/C++ aware text editor
  3. Debug C/C++ code
  4. Call Stack information
  5. Set breakpoints at certain source lines on IDE.

Outstanding support

One year of support included with purchase – gives you access to all product updates and new versions released in the support period plus access to Intel Premier Support. There's a very active user forum for help from experienced users and Intel engineers

  • Videos on Getting Started with Intel® C++ Compiler
  • Vectorization Essentials
  • Performance Essentials with OpenMP 4.0 Vectorization
  • View slides

Register for future Webinars

Previously recorded Webinars:

  • Update Now: What’s New in Intel® Compilers and Libraries
  • Performance essentials using OpenMP* 4.0 vectorization with C/C++
  • Intel® Cilk™ Plus Array Notation - Technology and Case Study Beta
  • OpenMP 4.0 for SIMD and Affinity Features with Intel® Xeon® Processors and Intel® Xeon Phi™ Coprocessor
  • Introduction to Vectorization using Intel® Cilk™ Plus Extensions
  • Optimizing and Compilation for Intel® Xeon Phi™ Coprocessor

Featured Articles

Kein Inhalt gefunden

More Tech Articles

Resolving problem when building HDF5* with Intel® compiler 14.0
Von Yolanda Chen (Intel)Veröffentlicht am 11/12/201310
Introduction To build the latest HDF5* with Intel® compiler 14.0, a segmantation fault occurs when running "make check". This article is to provide a solution in resolving this issue. The information in this article is assuming you already undertand how to build HDF5* with Intel compilers by readin…
Getting Started with Intel® Composer XE 2013, New User Compiler Basics
Von AmandaS (Intel)Veröffentlicht am 11/07/20130
Compiler Methodology for Intel® MIC Architecture Getting Started with Intel® Composer XE 2013, New User Compiler Basics Overview Modern compilers can be invoked with hundreds of options. From these, what are the essential set of options needed by the typical application programmer? This chapter has…
Memory Allocation and First-Touch
Von AmandaS (Intel)Veröffentlicht am 11/07/20132
Compiler Methodology for Intel® MIC Architecture Memory Allocation and First-Touch Memory allocation is expensive on the coprocessor compared to the Intel® Xeon processor so it is prudent to reuse already-allocated memory wherever possible. For example, if a function gets called repeatedly (say i…
Overview of Vectorization Reports and the -vec-report6 Option
Von Ronald W Green (Intel)Veröffentlicht am 11/07/20130
Compiler Methodology for Intel® MIC Architecture Overview of Vectorization Reports and the -vec-report6 Option Note: This article applies to Intel Compiler version 14.X and earlier. With version 15.0, the four optimization report options (-opt-report, -vec-report, -openmp-report, and -par-repo…
Intel Developer Zone Beiträge abonnieren

Supplemental Documentation

Kein Inhalt gefunden
Intel Developer Zone Beiträge abonnieren

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum

Optimization problems with std::array maybe due to RVO
Von velvia2
Hi, I discovered a source of slowdown in my program, due to the usage of std::array. To gain a better understanding of what was going on, I have used my own implementation of std::array, and the slowdown disappeared. Unfortunately, I can't show you the program I am working on due to some non disclosure agreement. But I've managed to track down the problem and put it in a simple file. When you compile the file shown below with icpc -c -std=c++11 -Ofast -xHost -ansi-alias -qopt-report=2 ode.cpp -o ode.oand look at the optimization report, the generated function is ode(StaticVector<double, 2UL> *, const StaticVector<double, 2UL> &)But if you comment out what is in between the "culprit" comments, all the constructors and assignements, the function does have a new signature ode(const StaticVector<double, 2UL> &)My guess is that it has to do with return value optimization. Could you please explain me why we have such a difference? Best regards, Francois #include &l…
Right Intel C++ compiler for 64-bit machine
Von polispip4
Hello, I just installed the Intel® Parallel Studio XE Professional Edition for C++ Linux* (evaluation version) for my Ubuntu 14.04.2 machine running on a "Intel Corporation Xeon E3-1200 Processor Family" processor. I don't understand if this is a 32-bit compiler or not. I'm worried about this since during the installation a message reporting the absence of some 32-bit libraries appears on the console.  Since I'm interested to have the best performances (running time) of my code, does a 64-bit specific version exist for my machine? Thank you
error: class "__m128" has no suitable assignment operator
Von Matt S.3
This code  #include <xmmintrin.h>   volatile __m128 a, b;   void test(void) {      a = b; }   produces this error $ /opt/intel/composerxe/bin/icpc -c error: class "__m128" has no suitable assignment operator        a = b;            ^   compilation aborted for (code 2)   when compiled with icpc.  There is no error if the variables are not volatile.  There is no error with icc or gcc or g++. Any suggestion on how to compile it with icpc?    
'Segmentation violation signal raised' when xiar runs
Von qpalz0
I am trying to compile chromium 41.0.2272.64 (64 bit) using icc on Linux. icpc --version shows icpc (ICC) 15.0.2 20150121 Copyright (C) 1985-2015 Intel Corporation.  All rights reserved. I compile the whole thing with -ipo option on. It compiles obj/content/browser/gpu/content_browser.gpu_process_host.o as usual with the following command (some unrelated stuff is skipped, and I know that some options shown below are not supported by icc or duplicated): icpc ... -fstack-protector --param=ssp-buffer-size=4 -pthread -fno-strict-aliasing -Wall -Wno-unused-parameter -Wno-missing-field-initializers -fvisibility=hidden -pipe -fPIC -Wno-unused-local-typedefs -pthread ... -m64 -march=x86-64 -O2 -fno-ident -fdata-sections -ffunction-sections -funwind-tables -O2 -march=native -ipo -no-prec-div -ansi-alias -parallel -fno-exceptions -fno-rtti -fno-threadsafe-statics -fvisibility-inlines-hidden -Wsign-compare -std=gnu++11 -Wno-narrowing -Wno-literal-suffix -c ../../content/browser/gpu/gpu_pro…
Problem compiling with armadillo
Von Daniel H1
Hello all, I'm using aramadillo ( to elegantly manipulate arrays. All was working well till the last version (4.650.2). Now it fails compiling with icpc (15.0.1, Linux) pretending there is a resolution problem.  The snippet code still compile fine with g++(4.9.2) and also clang (3.5) showing no error or warning. I've filed a bug to armadillo team, but they told me it is an Intel issue. I would be happy if any solution exists to solve this issue as I both need using armadillo and the intel compiler for speed and efficency. Thanks in advance for any answer. Daniel Here is the code: #define ARMA_DONT_USE_WRAPPER #define ARMA_DONT_USE_HDF5 #define ARMA_DONT_USE_BLAS #include <armadillo> using namespace std; using namespace arma; #define LEN 50 int main() { mat::fixed<LEN,9> beta; vec::fixed<LEN> alpha; mat::fixed<LEN,9> ash1; ash1=repmat(alpha,1,9)-beta; return(0); }and the compilation error: ~ $ icpc -I armadillo-4.650.2/inc…
A possible bug found in ICC compiler with inline ASM
Von mengke e.5
I found a bug when using the inline ASM of Intel Parallel Studio XE 2015 Update 2 Composer Edition for C++ Windows. Since I'm not very familiar with inline ASM, I'm not sure if it is a bug. #include <iostream> using namespace std; __forceinline void DoNothingWithMemory( float*const copyByValue ) { float* copyByValueAgain = copyByValue ; /* As you see, the two var in this function, "copyByValueAgain" and "copyByValue", are copied by value. Therefore, even if the code block of inline asm below changes one of them, there's nothing to do with the var "p" in the main function. However, the fact is, the var "p" in the main function IS changed after executing the code block of inline asm below! */ __asm__ __volatile__( "lea 4(%0),%0;" //In fact, nothing is done here. It's just a "lea", and has nothing to do with memory or pointer aliasing! : :"r"( copyByValueAgain ) : ); } int main() { float a; //just a place holde…
"internal error: backend signal" when compilng "DRMAA for PBS"
Von Eric R.5
Hello, On two separate systems I've attempted to compile "DRMAA for PBS" (found here). It successfully compiles with gcc and fails with icc resulting in the "interal error: backend signal". From what I can tell from searching in previous forums that error message is highly problematic and should be reported. This error has occurred on the 15.0.1 version on the Intel Compiler. It was compiled under linux. The build was a simple "configure" followed by a "make". The problematic file and resulting error is displayed below: icc -DHAVE_CONFIG_H -I. -I.. -I/opt/torque/include -I/opt/torque/include/torque -I../drmaa_utils -fPIC -D_REENTRANT -D_THREAD_SAFE -DNDEBUG -D_GNU_SOURCE -DCONFDIR=~/opt/pbs-drmaa-1.0.18/etc -Wall -W -Wno-unused-parameter -Wno-format-zero-length -pedantic -ansi -g -O2 -pthread -MT drmaa.lo -MD -MP -MF .deps/drmaa.Tpo -c drmaa.c  -fPIC -DPIC -o .libs/drmaa.o (0): internal error: backend signals Thanks for your time in advance, - Eric
[OS X] a tuple of tuples
Von t.ueshiba@aist.go.jp1
The following code contains a std::tuple of tuples. It can be successfully compiled with c++ compiler provided by Apple but fails with icpc- under OS X 10.10(Yosemite). #include <tuple>   int main() {     using namespace    std;       tuple<tuple<int, float>, tuple<long, double> > x;       return 0; }
Foren abonnieren

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum

Less performance on 16 core than on 4 ?!
Von sdfsadfasdf s.2
Hi there, I evaluated my cilk application using "taskset -c 0-(x-1) MYPROGRAM) to analyze scaling behavior.   I was very suprised to see, that the performances increases up to a number of cores but decreases afterwards. for 2 Cores, I gain a speedup of 1,85. for 4, I gain 3.15. for 8 4.34 - but with 12 cores the performance drops down to a speedup close to the speedup gained by 2 cores (1.99). 16 cores performe slightly better (2.11) How is such an behaviour possible? either an idle thread can steal work or it cant?! - or may the working packets be too coarse grained and the stealing overhead destroys the performance with too many cores in use?!
Exception when run project at debug mode using cilk_for
Von Tam N.1
Dear all, I have used cilk_plus to make parallel processing into my source code with visual studio 2008 IDE. But when I build it at debug mode, the project throw an exception below: "Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function pointer declared with a different calling convention" How can  I resolve it to make debug mode operated ? Thanks of all, Tam Nguyen  
Cilk Tools error while loading shared libraries
Von Nicholas N.7
I have successfully compiled cilkplus for gcc (4.8 branch) on Ubuntu 14.04 LTS and compiled the example program fib on the cilkplus website.  I would like to run cilkview and cilkscreen on it, and so I downloaded cilk tools from the website as well.  However, when I try to run cilkview, I get the following error: Cilkview: Generating scalability data -t: error while loading shared libraries: -t: cannot open shared object file: No such file or directory I've tried changing the environment variables $LIBRARY_PATH and $LD_LIBRARY_PATH to point to the libraries in the cilk tools directory, but I still come up with the same error.  I also noticed that on the cilk tools downloads, for linux there is an extra set of libraries (libelf and libdwarf), which I have also installed on my system.  I tried looking at the depenencies for cilkview, but I couldn't find anything unusual with those.  Here is the output: $ ldd cilkview =>  (0xf7735000) => /lib32/lib…
Cilk_for returns wrong data in array.
Von Đặng P.8
Hello everyone. I am new to multi threading programming. Recently, i have a project, which i apply cilk_for into it. Here is the code: void myfunction(short *myarray) { m128i *array = (m128i*) myarray cilk_for(int i=0; i<N_LOOP1; i++) { for(int z = 0; z<N_LOOP2; z+=8) { array[z] = _mm_and_si128(array[z],mym128i); array[z+1] = _mm_and_si128(array[z+1],mym128i); array[z+2] = _mm_and_si128(array[z+2],mym128i); array[z+3] = _mm_and_si128(array[z+3],mym128i); array[z+4] = _mm_and_si128(array[z+4],mym128i); array[z+5] = _mm_and_si128(array[z+5],mym128i); array[z+6] = _mm_and_si128(array[z+6],mym128i); array[z+7] = _mm_and_si128(array[z+7],mym128i); array+=8; } } }After the above code ran, ridiculous thing happens. The data in array isn't updated correctly. For example, if i have an array with 1000 elements, there is a chance that the array will be updated correctly (1000 …
intel cilk plus cilkscreen and tbb/scalable_allocator
Von pitsianis0
Dear friends, the following simple code seems to run just fine, however, cilkscreen is shouting "Race condition"! Shall I trust it? Or it is just false sharing? So, what scalable memory allocator is fast and thread safe to use with intel cilk plus?   #include <cilk/cilk.h> #include "tbb/scalable_allocator.h" char * array[10000000]; int main(int argc, char **argv) { cilk_for (int i = 0; i < 10000000; i++) { array[i] = (char *) scalable_malloc(1); } cilk_for (int i = 0; i < 10000000; i++) { scalable_free(array[i]); } return 0; } I compile it with icc -lcilkrts -ltbbmalloc -o example -O3 -std=c99 example.cbut  $ /usr/pkg/intel/bin/cilkscreen ./example Cilkscreen Race Detector V2.0.0, Build 3566 Race condition on location 0x7fc83fd4ae90 write access at 0x7fc83fb0fd5c: (/tmp/tbb.MXm12595/1.0/build/fxtcarvm024icc13_0_64_gcc4_6_cpp11_release/../../src/tbbmalloc/tbbmalloc_internal.h:913, rml::internal::TLSKey::createTLS+0xec) read access at 0x7fc83fb0de…
Internal compiler error 010101_239
Von martin.toeltsch@symena.com12
Hi guys, I condensed our project down to a piece of code that lets you reproduce the following issue. When I compile this in Release configuration (Debug works), I get this compiler error: 1>------ Build started: Project: ng-gtest, Configuration: Release x64 ------ 1> CilkTest.cpp 1>" : error : 010101_239 1> 1> compilation aborted for General\CilkTest.cpp (code 4) ========== Build: 0 succeeded, 1 failed, 3 up-to-date, 0 skipped ========== This is our compiler: Intel(R) C++ Intel(R) 64 Compiler XE for Intel(R) 64, version 14.0.3 Package ID: w_ccompxe_2013_sp1.3.202 OS: Windows 7, x64. This is the code: #include <math.h> const int VecSize = 8; const short* acdata; const short* lowdata; const unsigned short* meas_data; const unsigned short* rdval; short trident[2 * VecSize]; short speed[2 * VecSize]; float spdfact[VecSize]; float spdfact2[VecSize]; float tdat[VecSize]; float array1[VecSize]; float array2[VecSize]; const float *input_01; const float *input_…
Efficient prefix scan library in Cilk Plus and accessible from C?
Von pitsianis3
Is there any efficient prefix scan library for Cilk Plus accessible from C? I was not able to find any and my implementation can hardly compete with the sequential version :-) An interface similar to the reducers will work nicely. Thank you.
Cilk worker scheduling
Von Haris R.1
Hello, I would like to understand better how Cilk scheduling works.  I am not sure how to phrase this question so I give it my best. I have downloaded the latest Intel Cilk runtime release (cilkplus-rtl-003365 -  released 3-May-2013). I use the classical Fibonacci example in Cilk. I wanted to know on what CPU core each worker executes. To Fibonacci example, I added a function that checks CPU affinity for every worker as in here: “printf” is located in “int fib(int n)” of the Fibonacci sample code. I get WORKER ID using “__cilkrts_get_worker_number()” While the program runs, I print each WORKER's ID and the CPU core affinity of each worker.  However, the result surprises me. I expected that some of the workers would run on different CPU cores but it seems that all workers are running on the same exact CPU core.  For example, I get this for every “printf” when running “./fib 30”: ***** WORKER ID: 0 on CPU core: 7 ***** ***** WOR…
Foren abonnieren