Intel® C and C++ Compilers

Leadership application performance

  • Advanced optimization and multithreading capabilities
  • Utilizes the latest Intel® AVX, AVX2 and AVX-512 instructions
  • Compatible with leading compilers and development environments

Interested in the Intel® C++ compiler for Android*? Click here.

C/C++ only:
From $699
Buy Now
Or Download a Free 30-Day Evaluation Version

C/C++ & Fortran:
From $1,199
Buy Now
Or Download a Free 30-Day Evaluation Version

Advanced Performance Features

  • High-Performance Parallel Optimizer (HPO) offers an improved ability to analyze, optimize, and parallelize more loop nests. This revolutionary capability combines vectorization, parallelization, and loop transformations into a single pass that is faster, more effective, and more reliable than prior discrete phases.
  • Automatic Vectorizer analyzes loops and determines when it is safe and effective to execute several iterations of the loop in parallel. Vectorization and auto-parallelization have been enhanced for broader applicability, improved application performance. It also offers insights into your code when you use the guided autoparallelization (GAP) feature. In addition, SIMD pragmas can be used for added user control of vectorization.
  • Guided Auto-Parallelization (GAP) is a unique capability in both Intel C++ and Intel Fortran compilers that suggests ways to improve auto-vectorization as well as auto-parallelization and data transformation. When used, GAP builds a report that may include suggestions for source code changes, use of pragmas, or use of specific compiler options. This is a powerful tool that can help you extend the auto-vectorization and auto-parallelism capabilities of the compiler.
  • Interprocedural Optimization (IPO) Interprocedural optimization is a simple switch setting that can speed application performance.. It can dramatically improves performance of small- or medium-sized functions that are used frequently, especially programs that contain calls within loops. It speeds application performance by inlining your code – a process that logically 'lines up' all the components of your application to speed execution.

  • Loop Profiler is part of the compiler and can be used to generate low overhead loop and function profiling to show hotspots and where to introduce threads.
  • Profile-Guided Optimization (PGO) is a multi-step process that optimizes application performance based on user workload. It improves application performance by reducing instruction-cache thrashing, reorganizing code layout, shrinking code size, and reducing branch mispredictions. PGO uses actual user workloads to understand how the application logic in your application is used. It then organizes your application according to those patterns to speed execution.

  • OpenMP* 3.1 is supported to help simplify pragma-based development of parallelism in your C and C++ applications.
  • Videos on Getting Started with Intel® C++ Compiler
  • Vectorization Essentials
  • Performance Essentials with OpenMP 4.0 Vectorization
  • View slides

Register for future Webinars


Previously recorded Webinars:

  • OpenMP 4.0 for SIMD and Affinity Features with Intel® Xeon® Processors and Intel® Xeon Phi™ Coprocessor
  • Introduction to Vectorization using Intel® Cilk™ Plus Extensions
  • Optimizing and Compilation for Intel® Xeon Phi™ Coprocessor

Featured Articles

Nenhum conteúdo foi encontrado

More Tech Articles

OpenMP Related Tips
By AmandaS (Intel)Posted 11/25/20130
Compiler Methodology for Intel® MIC Architecture OpenMP Related Tips OpenMP* Loop Collapse Directive Use the OpenMP collapse-clause to increase the total number of iterations that will be partitioned across the available number of OMP threads by reducing the granularity of work to be done by e...
Resolving problem when building HDF5* with Intel® compiler 14.0
By Yolanda Chen (Intel)Posted 11/12/20139
Introduction To build the latest HDF5* with Intel® compiler 14.0, a segmantation fault occurs when running "make check". This article is to provide a solution in resolving this issue. The information in this article is assuming you already undertand how to build HDF5* with Intel compilers by read...
Getting Started with Intel® Composer XE 2013, New User Compiler Basics
By AmandaS (Intel)Posted 11/07/20130
Compiler Methodology for Intel® MIC Architecture Getting Started with Intel® Composer XE 2013, New User Compiler Basics Overview Modern compilers can be invoked with hundreds of options. From these, what are the essential set of options needed by the typical application programmer? This chapter h...
Memory Allocation and First-Touch
By AmandaS (Intel)Posted 11/07/20131
Compiler Methodology for Intel® MIC Architecture Memory Allocation and First-Touch Memory allocation is expensive on the coprocessor compared to Xeon - so it is prudent to reuse already-allocated memory wherever possible. For example, if a function gets called repeatedly (say inside a loop), and ...

Páginas

Assine o

Supplemental Documentation

Nenhum conteúdo foi encontrado
Assine o

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


A member with in-class initializer must be const. What?
By dnesteruk1
I have explicitly enabled C++0x support in my project (latest C++ composer version) and I'm getting the above error. What is going on? The code I'm using is static bimap<string, Type> typeNames = createTypeNames(); It doesn't seem like this works, and neither does inline initialization (i.e., typeNames = { { "Foo", Type::abc }}).
Some projects can't compile when using C++ Composer XE 2015 beta
By pkshan4
Hello, Tried the newest Intel C++ Composer XE 2015 Pre-Release beta. (2015.0.0.30) I can successfully compile most projects without issue. but I got an error on some projects: 3>xilink: : error #10037: could not find 'llvm_com' 3>xilink: : error #10014: problem during multi-file optimization compilation (code -1) 3>xilink: : error #10014: problem during multi-file optimization compilation (code -1) Did I miss something? My IDE is VS2013 Thanks
Mapping between instrinsics and assembly code
By Joe M.16
  I'm using unaligned and aligned load intrinsics in my code and ICC does not behave as I expect it to.  If this is expected behavior, can somebody educate me on why? The fundamental problem is I expect aligned load intrinsics to generate aligned instructions while unaligned intrinsics generate unaligned instructions However, what I see is that depending on compiler flags sometimes aligned load intrinsics generate unaligned instructions.  I've attached a snippet of code to demonstrate.  I realize this code will segfault when run.  The point is just to compile and and look at the generated assembly code. There are 3 cases I experimented with (comments in the code give compiler version and detailed compile arguments) gcc - GNU compiler behaves as expected meaning that aligned load intrinsics map to aligned load instructions. icc  with no "-m" argument.  This works exactly like gcc.  Aligned loads map to aligned intrinsics. icc with '-mavx' argument.  (Note gcc requires this argument t...
OpenMP 4.0 target directives
By Alexander G.1
I tried to get an OpenMP 4.0 test case to run where the code calls a user library that is compiled for host and mic (-mmic) The host library is in current directory (pwd), the MIC lib is in pwd/mic and I set LD_LIBRARY_PATH to pwd and MIC_LD_LIBRARY_PATH to pwd/mic However the execution fails with "On the sink, dlopen() returned NULL. The result of dlerror() is "/tmp/coi_procs/1/31669/load_lib/iccoutysEa1j: undefined symbol: foo" Code look like this: #include <omp.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include "foo.h" #pragma omp declare target int foo(); #pragma omp end declare target int main() { /* start user code */ #pragma omp target { foo(); } return 0; }Command line is: icc -shared -fPIC -o libhostlib.so hostlib.c icc -mmic -shared -fPIC -o mic/libhostlib.so hostlib.c icc -openmp -o test_mic.o -c test_mic.c icc -openmp -L. -lhostlib -o test_mic test_mic.o Why isn't this working?  
AVX is slower than serial execution ?
By zhang y.13
I write a simple program and build with icpc to examine the performance of AVX in my mathine. The code snippet is as following, #define T 2000000 #define X 16 #define Y 16 #define Z 16 for(int t=0;t<T;t++) for(int k=0;k<Z;k++) for(int j=0;j<Y;j++) for(int i=0;i<X;i++) A[k][j][i]=B[k][j][i]+C[k][j][i];The configures are as following,             icpc version 13.1.0 (gcc version 4.6.1 compatibility)             FFLAGS="-O3 -xhost "             Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz             Red Hat Enterprise Linux Server release 6.3 (Santiago) The exeperiment result is as following, collapse;width:216pt" width="288"> niterator 2000000 2000000 200000 size 12*12*12 16*16*16 32*32*32 time  (s)       serial 1.09918 2.58384 2.99971 avx 1.71405 4.01935 5.18318  As the table, AVX version always cost more time then serial version. Can somebody know why? Thanks in advance!!!
ICC benchmarking and ICC vs GCC
By Mourad B.14
Hi, as you see I am new here in this forum. I have a project comparing ICC vs GCC. I am looking for the best way to start this project. any links/ tutorial/ documentations are more than welcome. thinking to do some benchmarking with both compilers using SPECInt or any other type of workloads/ programs. Thanks for your help. -Lx 
OpenMP / memory saturation
By Sean G.4
Hi, I am working on an application which I'm pretty sure is memory bound.  I tried doing some simple OpenMP, but there was no speedup, which seems to confirm that the kernel is indeed memory bound. However, if Intel's newer architectures really look like this: http://software.intel.com/sites/default/files/m/d/4/1/d/8/5-3-figure-1.gif shouldn't I be able to try to pin one thread somewhere on the second four cores to get increased memory bandwidth? It seems like pinning a thread to a core might take some work, so I wanted to see if this makes sense before I tried it. Thanks
How to uninstall Intel® C++ STM Compiler Prototype Edition 3.0?
By arpitj3
Hi, I have installed Intel® C++ STM Compiler Prototype Edition 3.0. What is the procedure to uninstall it. I could not find any uninstall script. Thanks!

Páginas

Assine o Fóruns

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


Floating Point ABI
By Nick T.2
Hello I noticed in the latest CilkPlus ABI specification (https://www.cilkplus.org/sites/default/files/open_specifications/CilkPlu...), it says that the caller to the library must set the floating point flags (top of page 8). This is what the LLVM implementation of CilkPlus and its runtime do, but the current Intel version of the run-time has the code to save the floating point status registers that is in LLVM's code generator and not the runtime from the LLVM repository. Please could you tell me whether: a) The floating point status flags should be set/saved by the caller b) The floating point status flags should be set/saved by the callee c) There's something I've overlooked The ABI says: "/** * Architecture - specific floating point state. mxcsr and fpcsr should be * set when CILK_SETJMP is called in client code. Note that the Win64 * jmpbuf for the Intel64 architecture already contains this information * so there is no need to use these fields on that OS/architecture. */" T...
How can I parallelize implicit loop ?
By Zvi Danovich (Intel)1
I have the loop, inside its body running the function with array member (dependent on loop index) as an argument, and returning one value. I can parallelized this loop by using cilk_for() operator instead of regular for() - and it is simple and works well.  This is explicit parallelization.  Instead of explicit loop instruction I can use Array Notation contruction (as shown below) - it is implicit loop. My routine is relatively long and complecs, and has Array Notation constructions inside, so it cannot be declared as a vector (elemental) one. When I use implicit loop - it is not parallelized, the run time is increased substantially. float foo(float f_in) {  float f_result;  // LONG computation containing CILK+ Array Notation operations  /////////////////////////////////////////////////////////  return f_result; } int main() {  float af_in[n], af_out[n]; // Explicit parallelized loop  cilk_for(int i=0; i<n; i++)   af_out[i] =  foo(af_in[i]); // Implicit non-parallelized l...
Patches or configure options to build the trunk on arm
By Karim C.0
Hello,  I want to build the trunk on an embedded system supporting armv7 instructions. The build was accomplished without errors but cilk/cilk.h and libcilkrts weren't built. I checked out the patches available on the internet they do support non x86 architectures but I think just i386 not arm. Are there other patches or config options to add while building so that I get those libraries along with the build  Regards   
Array of Reducers - Possible in C?
By Detector1
I was wondering if it is possible to create an array of reducers in C? I already read the documentation, but they use always only one reducer. However, how do I use Cilk reducers for an array with int or double values? Can you give  me a short example? Thanks in advance.
Try #3 - Using Intel Toolset instead of GCC
By Robert M.6
After my attemps to use the cilk-enabled gcc were not successfull (http://software.intel.com/en-us/forums/topic/500669 and http://software.intel.com/en-us/forums/topic/405676) I try using Intel C++ Studio XE now (non-commercial student license) After a ("Successfull") install I rebooted the system. My reward for downloading gigs of data and installing for about 1 day was now a messages "vtts_cpuevents_init[cpu1]: all fixed counters are broken". cilk really starts to annoy me. is it actually so hard to get this to work? Or do I have bad luck?!   What more can I do?!
Erroneous use of __sec_implicit_index() ?
By Zvi Danovich (Intel)8
I want to pass the index of array to routine, that uses it inside to compute and return some result, that should be assigned to output array member. I mean the following: for(int i=0; i<I_max; i++)      a_output[i] = foo(i, a1,a2, ...);Instead of this loop I tried to use CILK+ Array Notation construction: a_output[0:I_max] = foo(__sec_implicit_index(0) , a1,a2, ...); This lines cause the error (compiler Intel C++ Compiler XE 14.0): 1>  CilkArrNot_test.cpp 1>D:\CilkArrNot_test\CilkArrNot_test\CilkArrNot_test.cpp(37): warning #18024: implicit index must be used in an array section context 1>" : error : ** segmentation violation signal raised **What is the problem in the code line above ? Maybe it is a bad idea to use __sec_implicit_index(0) for such a goal ? What is the right way to collapse the given loop to array section operation ?
Procblems using gcc with cilkplus binary
By Robert M.5
Hallo, I tried the binary from August 2013 from https://www.cilkplus.org/download-0#gcc-development-branch on my Ubuntu 13.4 x86 VM. I extracted files to ~/cilkplus-4_8-install and gave execution permission to all files in /bin subfolder and executed export LD_LIBRARY_PATH=~/cilkplus-4_8-install/lib:~/cilkplus-4_8-install/lib64:$LD_LIBRARY_PATH export LIBRARY_PATH=~/cilkplus-4_8-install/lib:~/cilkplus-4_8-install/lib64:$LIBRARY_PATH but not even ~/cilkplus-4_8-install/bin/gcc --version or ~/cilkplus-4_8-install/bin/g++ --version works. g++ simply does nothing, for gcc I get the error message "cannot execute binary file". whats wrong here?   thank you        
cross compile cilk gcc form arm-linux
By Karim C.12
I have successfully compiled cilk gcc for a host computer but I would like to create binaries to compile my cilk gcc codes for a target board arm-linux.  I have taken the general procedure used to cross compile gcc-4.8.2, but it shows errors related to some defined cilk variables that are not in the scope.  These variables are in the file target-hooks-def.h that is generated in object holding directory during the built (#define TARGET_CILKPLUS_BUILTIN_ITT_NOTIFY_SECTION_NAME default_itt_notify_section_name). if there is a link a special procedure or guideline I could follow I would really appreciate.

Páginas

Assine o Fóruns

Integration into Microsoft Visual Studio*, Compatibility with the gnu* tool chain


The Intel compilers that are part of Intel® Composer XE 2013 for Windows integrate into Microsoft Visual Studio* Microsoft Visual Studio* 2008-2013. This means all parts of Intel Composer XE 2013 – Intel C++, Intel Fortran and the Performance Libraries – are usable through Visual Studio. This preserves your knowledge of and investment in Visual Studio. Intel C++ compilers are also source and binary compatible with Microsoft Visual C++ which makes it easier to switch to Intel compilers or use them for the performance-sensitive parts of your application while continuing to use Visual C++ for other parts. It's a similar story on Linux. Compilers in Composer XE products for Linux are compatible with the gnu tool chain and are source and binary compatible with gcc.

A choice of suites that include the Intel C or C++ compiler(s):

Software Development Suite

Includes the following components for development on:

Build, Debug and Tune**

Windows*

Linux*

OS X*

Intel® Cluster Studio XE Intel® C++ Composer XE
Intel® Visual Fortran Composer XE
Intel® Trace Analyzer and Collector
Intel® MPI Library
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE
Intel® C++ Composer XE
Intel® Fortran Composer XE
Intel® Trace Analyzer and Collector
Intel® MPI Library
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE
 
Intel® Parallel Studio XE Intel® Fortran Composer XE
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE
Intel® Fortran Composer XE
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE
 

Intel® C++ Studio XE

Intel® C++ Composer XE
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE

Intel® C++ Composer XE
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE

 

Intel® Cluster Studio Intel® Composer XE
Intel® Trace Analyzer and Collector
Intel® MPI Library
Intel® MPI Benchmarks
Intel® Composer XE
Intel® Trace Analyzer and Collector
Intel® MPI Library
Intel® MPI Benchmarks
 

Build***

Windows*

Linux*

OS X*

Intel® Composer XE Intel® Fortran Compiler
Intel® C++ Compiler
Intel® Math Kernel Library
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives
Intel® Fortran Compiler
Intel® C++ Compiler
Intel® Math Kernel Library
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives
 

Intel® C++ Composer XE

Intel® C++ Compiler
Intel® Math Kernel Library
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives

Intel® C++ Compiler
Intel® Math Kernel Library
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives

Intel® C++ Compiler
Intel® Math Kernel Library
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives

** Build, Debug and Tune Suites include C/C++ and/or Fortran compiler(s), libraries, threading assistant, error checking and performance profiler
*** Build Suites include C/C++ and/or Fortran compiler(s) and libraries.