Intel® C and C++ Compilers

Leadership application performance

  • Advanced optimization and multithreading capabilities
  • Utilizes the latest Intel® AVX, AVX2 and AVX-512 instructions
  • Compatible with leading compilers and development environments

Interested in the Intel® C++ compiler for Android*? Click here.

C/C++ only:
From $699
Buy Now
Or Download a Free 30-Day Evaluation Version

C/C++ & Fortran:
From $1,199
Buy Now
Or Download a Free 30-Day Evaluation Version

Advanced Performance Features

  • High-Performance Parallel Optimizer (HPO) offers an improved ability to analyze, optimize, and parallelize more loop nests. This revolutionary capability combines vectorization, parallelization, and loop transformations into a single pass that is faster, more effective, and more reliable than prior discrete phases.
  • Automatic Vectorizer analyzes loops and determines when it is safe and effective to execute several iterations of the loop in parallel. Vectorization and auto-parallelization have been enhanced for broader applicability, improved application performance. It also offers insights into your code when you use the guided autoparallelization (GAP) feature. In addition, SIMD pragmas can be used for added user control of vectorization.
  • Guided Auto-Parallelization (GAP) is a unique capability in both Intel C++ and Intel Fortran compilers that suggests ways to improve auto-vectorization as well as auto-parallelization and data transformation. When used, GAP builds a report that may include suggestions for source code changes, use of pragmas, or use of specific compiler options. This is a powerful tool that can help you extend the auto-vectorization and auto-parallelism capabilities of the compiler.
  • Interprocedural Optimization (IPO) Interprocedural optimization is a simple switch setting that can speed application performance.. It can dramatically improves performance of small- or medium-sized functions that are used frequently, especially programs that contain calls within loops. It speeds application performance by inlining your code – a process that logically 'lines up' all the components of your application to speed execution.

  • Loop Profiler is part of the compiler and can be used to generate low overhead loop and function profiling to show hotspots and where to introduce threads.
  • Profile-Guided Optimization (PGO) is a multi-step process that optimizes application performance based on user workload. It improves application performance by reducing instruction-cache thrashing, reorganizing code layout, shrinking code size, and reducing branch mispredictions. PGO uses actual user workloads to understand how the application logic in your application is used. It then organizes your application according to those patterns to speed execution.

  • OpenMP* 3.1 is supported to help simplify pragma-based development of parallelism in your C and C++ applications.
  • Videos on Getting Started with Intel® C++ Compiler
  • Vectorization Essentials
  • Performance Essentials with OpenMP 4.0 Vectorization
  • View slides

Register for future Webinars


Previously recorded Webinars:

  • OpenMP 4.0 for SIMD and Affinity Features with Intel® Xeon® Processors and Intel® Xeon Phi™ Coprocessor
  • Introduction to Vectorization using Intel® Cilk™ Plus Extensions
  • Optimizing and Compilation for Intel® Xeon Phi™ Coprocessor

Featured Articles

No se encontró contenido

More Tech Articles

Overview of Vectorization Reports and new vec-report6
By Ronald W Green (Intel)Posted 11/07/20130
Overview of vectorization report Compiler Methodology for Intel® MIC Architecture Vectorization Essentials, Vectorization and Optimization Reports, Overview of vectorization reports and new vec-report6 Existing –vec-report levels (0 to 5) controls emission of the following vectorization report ...
Efficient Parallelization
By Ronald W Green (Intel)Posted 11/07/20130
Document   Compiler Methodology for Intel® MIC Architecture Efficient Parallelization Overview This chapter covers topics in parallelization. There are links to various parallelization methods and resources along with tips and techniques for getting optimal parallel performance. Goals ...
Cache Blocking Techniques
By AmandaS (Intel)Posted 11/07/20130
Compiler Methodology for Intel® MIC Architecture Cache Blocking Techniques Overview An important class of algorithmic changes involves blocking data structures to fit in cache. By organizing data memory accesses, one can load the cache with a small subset of a much larger data set. The idea is...
New User Compiler Basic Usage
By Ronald W Green (Intel)Posted 10/17/20130
Compiler Methodology for Intel® MIC Architecture New User Compiler Basic Usage This chapter is intended for users that are new users of Intel compilers or are not very familiar with common compiler options used to control optimization, vectorization, and floating point calculations.  It is impo...

Páginas

Suscribirse a

Supplemental Documentation

No se encontró contenido
Suscribirse a

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


Error: A license for BetaSTMCCompL could not be obtained.
By arpitj4
Hi, I have installed Intel® C++ Composer XE 2013 for Linux and Intel® C++ STM Compiler Prototype Edition 4.0 with a non-commercial/student license. When I try to compile using icc, I get the following error. Can anyone help me resolving the license issues? ============================================= ... (COMPILE)  build/library/common/debug.c Error: A license for BetaSTMCCompL could not be obtained. Your license has expired. License file(s) used were (in this order):     1.  Trusted Storage     2.  /home/arpit/intel/licenses     3.  /opt/intel/licenses/l_780096982_CMR4LJM9.lic     4.  /opt/intel/licenses/l_CMR4LJM9.lic     5.  /intel/licenses     6.  /Users/Shared/Library/Application Support/Intel/Licenses     7.  /home/arpit/intel/bin/intel64/l_cpp_32e_cd.lic Please visit http://www.intel.com/software/products to obtain license renewal information. icc: error #10052: could not checkout FLEXlm license scons: *** [build/library/common/debug.o] Error 1 scons: building termi...
Optimization icpc 13 vs icpc 14
By Sergio S.5
Currently I am working on a simulation project which I develop in my own pc (Chakra linux, intel core i5). When I compile using icpc v 14.0.2 I get really nice results in terms of speed. The problem is that when I compile the code in a cluster (CentOs,  Xeon E7-8837 ) using icpc v 13.1.1 the execution time doubles. I have really no clue on what can be the cause of such behavior as my code is relatively simple. I would like to know what could be the possible causes and how to check them. Also If further information is needed please let me know so I can update the post.
internal error with -fno-rounding-math
By tp5
$ icc -fno-rounding-math icc: internal error: Assertion failed (shared/driver/options.c, line 1684) $ icc --version icc (ICC) 14.0.2 20140120
Issues building MPI
By Ouissem B.2
Hi all, I recently installed both Intel Fortran and Intel C++ compilers for Linux on my system (Ubuntu 12.04 LTS with IA32). When trying to install MPICH, I'm unable to configure the installation, and I get this error : checking for gcc... icc checking whether the C compiler works... no configure: error: in `/home/ouissem/Downloads/openmpi-1.6.5': configure: error: C compiler cannot create executables See `config.log' for more details The same procedure works fine with GNU compilers. The icc and icpc compilers are well recognized by the system, but unable to built any executable. Can some one help me with this issue. Any suggestion would be helpful. Thank you very much for your help.
Vectorization Issue with loop iterations
By Pramod K.11
Hi All, I am trying to compile following sample kernel with Intel (ICC) 14.0.0 20130728 (or version > 12 ). I see strange behaviour with vectorization. I have following questions: If I change _iml variable type to int instead of long int, compiler doesn't vectorize the code. If I see vectorization report with  -vec-report3, I see large report with ANTI and FLOW dependencies which seems correct.  But I didn't understand what compiler does to vectorize when I change loop iteration variable type to long int. Below example is auto-generated kernel from domain specific language. We have large array and we process 18 elements of array for every iteration (say those 18 elements represent a particle). So iterations are independent. But this memory layout looks similar to AoS (arrya of struct with 18 elements). AoS is not good for vectorization, I want to understant how Intel compiler vectorize this code. compute() function is actual compute kernel that  I want to vectorize. Please follow...
Templated user-defined type conversion to abstract reference type
By Pavel J.1
I have the following (very simplified) "container" class to store objects of "any" type in shared pointers: class container { public: template<typename T> container(const boost::shared_ptr<T> &rhs) : m_content(rhs) { } template<typename T> operator T const & () const { return get<T>(); } template<typename T> T const & get() const { return *boost::any_cast< boost::shared_ptr<T> >(m_content); } private: boost::any m_content; };If I store an object of some type in the container, I would like to get the reference simply by a user-defined conversion which would allow to do something like this: boost::shared_ptr<some_type> x(new some_type); container c_x = x; // Two methods of getting the reference to the instance stored // in c_x are possible: // 1) using a user-defined conversion: const so...
Conditional compilation bug or feature ?
By emmanuel.attia5
Hi, I have a software that has 2 versions of an implemented algorithm, so it used different versions of the same class (in my case its SSE vs AVX, but really, it's not relevant) in 2 different compiler units, and suprisingly the linker take the initiative of taking an arbitrary one. Is it a bug (that I reproduced as well in Microsoft and GCC compiler) or a feature (if that so can someone help me finding where this behavior is specified) ? Here is how to reproduce the problem: toto.h #include <stdio.h> #ifdef FLAG class foo { public: void bar() { printf("FLAG\n"); } }; #else class foo { public: void bar() { printf("NO FLAG\n"); } }; #endif #ifdef FLAG void call_foobar_FLAG() #else void call_foobar_NOFLAG() #endif { foo().bar(); } toto_flag.cpp #define FLAG #include "toto.h" toto_noflag.cpp #include "toto.h" main.cpp void call_foobar_FLAG(); void call_foobar_NOFLAG(); int main(int argc, char ** argv) {     call_foobar_FLAG();...
Optimization features.
By Even E.11
Hello, I'd like to talk about two weird things in the optimization process of the compiler. #1 : sqrtsd seems preferred over sqrtpd... (just did a 30% performance boost by forcing the use of 2 sqrtpd instead of the 4 sqrtsd previously generated in my code). I know it's not often that sqrt is used sequentially so I won't mind if that feature doesn't appear. However here comes #2. #2 : in an intrinsics based function the result must be _mm_store..'d to be returned so I expected some RVO to appear in the assembly in the case the returned value is to be loaded in an xmm just after that. Instead there is a pair of store/load to/from an unused local stack variable that could be simplified by a mov between 2 xmm registers. I find that a bit strange being used to see ICC getting rid of everything it can, making dead code elimination a hell to avoid for performance tests btw :) Note : the code is compiled with the Ox flag by the last ICC integrated into MSVC2013. Thank you in advance for you...

Páginas

Suscribirse a Foros

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


Some bugs in cilkplus-gcc
By Niklas B.4
Hello, I'm using the cilkplus GCC extensions and ran into a few smallish bugs. All of them are reproducible with the most recent cilkplus-gcc 'gcc version 4.9.0 20130520 (experimental) (GCC)' $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr0/home/nbaumstark/cilkplus-install/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /home/nbaumstark/cilkplus-gcc/configure --prefix=/home/nbaumstark/cilkplus-install --enable-languages=c,c++ --disable-multilib Thread model: posix gcc version 4.9.0 20130520 (experimental) (GCC) 1. cilk_for pollutes local namespace in template functions. I think I saw a report of this one somewhere, but can't find it anymore and it's not been fixed: template <typename T> void test() {   int test = 0;   cilk_for (int i = 0; i < 10; ++i) test += i;   cilk_for (int i = 0; i < 10; ++i) test += i; } int main() {   test<int>(); } Output: $...
Memory Limitiation into Cilk ?
By Markus G.12
I just want to use Cilk Plus with implicit shared memory , but i am getting this error : HOST--ERROR:myoiExPLExtendVSM: VSM size exceeds the limitation (4294967296) now! HOST--ERROR:myoiExMalloc:662 Fail to get a new memory chunk! HOST--ERROR:myoArenaMalloc1: Fail to get free memory space! HOST--ERROR:myoArenaAlignedMalloc1: No enough memory space! there is still enough space, it's also works with OpenMp and TBB but the program crashes after start with this comment, are there any paramenters to use bigger memory spaces . I didn't found any manuals to this. best regards
Error compiling Cilk Plus runtime
By angelee@mit.edu1
Hi, I am having trouble compiling the runtime library distributed with the GCC4.8 Cilk Plus branch. I was able to build the entire compiler, which compiles the runtime library correctly as part of the build, but I would like to be able to build and package the runtime library by itself, and that's where I ran into issues.  I followed the instructions in libcilkrts/README. At the configure step, I encountered the following error: >  ./configure --prefix=<...> --disable-multilib  ... ./config.status: line 1371: ./../../config-ml.in: No such file or directory This is because the file config-ml.in is in one directory up, but not two (relative from within the libcilkrts/ dir). I got around this by copying the config-ml.in one level up, and I was able to complete configuration successfully. When I try to make, however, I encountered the following error: > make /bin/sh ./libtool  --tag=CXX   --mode=compile g++ -DPACKAGE_NAME=\"Cilk\ Runtime\ Library\" -DPACKAGE_TARNAME=\"...
Branch and bound search: avoid copying of data
By Robert M.5
Hi there. I want to implement a multi-threaded version of a branch-and-bound search. The search space is a 16 level 128-ary tree. So far, the intermediate result is stored in an array. this does not contain only the 16 levels, but also other entries, which are deduced from those values and needed to check the constrains for the bound condition. Cilk Plus using work stealing seems well suited to implement this in a convenient way, especially as the framework itself decides whether to split a task or not. The problem is, that every thread needs its own copy of the array, as it is part of the current state. Not knowing where this split might happen, I assume I have to copy the data for every working packet I use. In an single-threaded implementation with recursive calls and copying the array for every single call (a DFS on the search tree) the performance drops dramatically compared to a single-threaded do-try-undo-approach which uses only one array. Is there anything I can do about ...
error: tried to pass SEH exception c0000005 through a spawn
By scott g.14
I have a situation that developed trying to port a valid OpenMP application to Intel Cilk Plus (this is my first time using Cilk Plus). I was getting the fatal error at runtime: "error: tried to pass SEH exception c0000005 through a spawn", sometimes two of them in a row. I found that while I had Cilk For and Syncs implemented my queries for cpu counts and thread numbers were still OpenMP. With OpenMP Support=Generate Parallel Code (/Qopenmp) set, if I (inappropriately) simply trade out the corresponding Cilk calls for OpenMP I can induce this error.  I suspect it's actually just the omp_get_thread_num() since that actually occurred in the worker. Of course if OpenMP support is disabled, the OpenMP calls are undefined and the problem goes away entirely. #ifdef USEOPENMP #define CPU_COUNT omp_get_num_procs(); #define CPU_THREAD_NUM omp_get_thread_num() #endif #ifdef USECILKPLUS #define CPU_COUNT __cilkrts_get_nworkers() #define CPU_THREAD_NUM __cilkrts_get_worker_number() #en...
Reducer's performance regression on windows laptop compared to linux
By QIAOMIN Q. (Intel)14
The attachment is the code of the eigen sample. I find something strange that when the program run on my windows laptop, the cilk version runs several times slower than the serial one, while the cilk version runs twice faster than the serial version on the Linux server. I compare the environment as following: OS                Compiler                            Core                                                            Time cost Windows 7    Version 14.0.1    Intel(R) Core(TM) i5 M560 @ 2.67GHz x2    Serial:1.817 sec                                                                                                                          Cilkfor:24.292 sec Red Hat Enterprise 6.0    version 14.0.1 (Cannot find 13)    Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz x4                                                                                                                                 Serial:1.028 sec                                                                        ...
Cilk+ or Cilk++ scheduling policies
By Samer H.1
  Folks I am looking for a document that describes the scheduling policies utilized in either Cilk+ or Cilk++, does anyone know about this? Thanks for you help . Samer   =================    
Benchmark for positive-definite dense matrices
By Abdul J.2
Hi all, does anybody know the source of any benchmark from where I can find data sets of positive-definite and dense (more than 50% non-zeros entry) matrices? regards, Abdul Jabbar 

Páginas

Suscribirse a Foros

Integration into Microsoft Visual Studio*, Compatibility with the gnu* tool chain


The Intel compilers that are part of Intel® Composer XE 2013 for Windows integrate into Microsoft Visual Studio* Microsoft Visual Studio* 2008-2013. This means all parts of Intel Composer XE 2013 – Intel C++, Intel Fortran and the Performance Libraries – are usable through Visual Studio. This preserves your knowledge of and investment in Visual Studio. Intel C++ compilers are also source and binary compatible with Microsoft Visual C++ which makes it easier to switch to Intel compilers or use them for the performance-sensitive parts of your application while continuing to use Visual C++ for other parts. It's a similar story on Linux. Compilers in Composer XE products for Linux are compatible with the gnu tool chain and are source and binary compatible with gcc.

A choice of suites that include the Intel C or C++ compiler(s):

Software Development Suite

Includes the following components for development on:

Build, Debug and Tune**

Windows*

Linux*

OS X*

Intel® Cluster Studio XE Intel® C++ Composer XE
Intel® Visual Fortran Composer XE
Intel® Trace Analyzer and Collector
Intel® MPI Library
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE
Intel® C++ Composer XE
Intel® Fortran Composer XE
Intel® Trace Analyzer and Collector
Intel® MPI Library
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE
 
Intel® Parallel Studio XE Intel® Fortran Composer XE
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE
Intel® Fortran Composer XE
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE
 

Intel® C++ Studio XE

Intel® C++ Composer XE
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE

Intel® C++ Composer XE
Intel® VTune™ Amplifier XE
Intel® Inspector XE
Intel® Advisor XE

 

Intel® Cluster Studio Intel® Composer XE
Intel® Trace Analyzer and Collector
Intel® MPI Library
Intel® MPI Benchmarks
Intel® Composer XE
Intel® Trace Analyzer and Collector
Intel® MPI Library
Intel® MPI Benchmarks
 

Build***

Windows*

Linux*

OS X*

Intel® Composer XE Intel® Fortran Compiler
Intel® C++ Compiler
Intel® Math Kernel Library
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives
Intel® Fortran Compiler
Intel® C++ Compiler
Intel® Math Kernel Library
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives
 

Intel® C++ Composer XE

Intel® C++ Compiler
Intel® Math Kernel Library
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives

Intel® C++ Compiler
Intel® Math Kernel Library
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives

Intel® C++ Compiler
Intel® Math Kernel Library
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives

** Build, Debug and Tune Suites include C/C++ and/or Fortran compiler(s), libraries, threading assistant, error checking and performance profiler
*** Build Suites include C/C++ and/or Fortran compiler(s) and libraries.