Intel C/C++ compilers complete adoption of LLVM

签署人: James R Reinders

发布日期: 08/09/2021

The next generation Intel C/C++ compilers are even better because they use the LLVM open source infrastructure.

LLVM helps us with our goal to offer the best C/C++ compilers for Intel architecture. The latest Intel C/C++ compilers, using LLVM, deliver faster compiler times, better optimizations, enhanced standards support, and support for GPU and FPGA offloading.

In this blog, I share information regarding our adoption of LLVM. I’ll discuss what it means for users of the compilers, why we did it, and the bright future ahead. While the Intel C/C++ compiler adoption of LLVM is complete, I will also share updates on the important (but not yet complete) Intel Fortran compiler adoption of LLVM.

The benefits of adopting LLVM are numerous. I will offer advice for upgrading from our classic compilers to our LLVM-based compilers. We are committed to making this as seamless as possible while yielding numerous benefits for developers who use the Intel compilers.

Benefits of adopting LLVM

Build Time Performance on Linux SPECrate 2017 Integer Suite 64 Bit (Estimated)The LLVM open source project is a collection of modular and reusable compiler and toolchain technologies supporting multiple processor architectures and programming languages. The Clang open source project provides a C/C++ frontend supporting the latest language standards for the LLVM project. LLVM, including Clang, are maintained by a large and very active development community.

There are many benefits in adopting LLVM, but let’s start with faster build times. Clang is fast. We all can appreciate that! We measured a 14% reduction in build times when using the Intel C/C++ compiler included in the Intel oneAPI 2021.3 toolkits. In addition to helping reduce build times, adoption of Clang has allowed us to contribute to, and benefit from, community efforts to support the latest C++ language standards.

Intel has a long history of contributing and supporting open source projects that includes a decade of contributions to LLVM. Our active collaborations today include optimization report additions, expanded floating-point model support, and enhanced vectorization. Intel contributes to LLVM projects directly, and we also have a staging area (Intel project for LLVM technology) for SYCL support. 

The performance of the Intel C/C++ compilers can be expected to give higher performance than the base clang+LLVM compilers for Intel architecture. The default for the Intel C/C++ compilers going forward are versions (icx) that have adopted the LLVM open source infrastructure. We continue our strong history of contributing to the clang and LLVM projects, including optimizations for LLVM and clang. Not all our optimization techniques get upstreamed—sometimes because they are too new, sometimes because they are very specific for Intel architecture. This is to be expected and is consistent with other compilers that have adopted LLVM.

SPECrate 2017 INT (Estimated) Performance advantage relative to other compilers on Intel® Xeon Platinum 8380 ProcessorWith the latest Intel C/C++ compilers, released with the Intel oneAPI toolkits versions 2021.3, we made a series of performance measurements. Consistent with our objective to be the leading C/C++ compiler for Intel architecture, our measurements show Intel C/C++ compilers besting other options. We also beat ourselves: the new LLVM-based Intel C/C+ compiler matches or exceed the Intel C/C++ classic compiler. It’s time to upgrade the compiler you use! I share one examples here, and more of our measurements are included at the end of this blog.
 
Intel C/C++ compilers have a history of offering leadership performance. While the classic Intel C/C++ compiler shows a 18% advantage over gcc here, the LLVM-based Intel C/C++ compiler shows a 41% advantage.

To support Intel's evolving platforms, we are focusing new feature and hardware support in our LLVM-based compilers where we have added highly optimized support for GPUs and FPGAs alongside our continuing commitment to provide industry leading CPU optimizations. Our LLVM-based compilers are where we will have support for SYCL, C++20, OpenMP 5.1, and OpenMP GPU target device support.

We encourage users to take advantage of the faster build times, higher levels of optimization, and new capabilities by moving now to our LLVM-based C/C++ compilers. Intel is committed long-term to LLVM, to help with ongoing innovation, and our relentless pursuit of industry leading optimizations.

What happened to the Parallel Studio XE compilers?

Intel Parallel Studio is now Intel oneAPI toolkitsIn 2007, we renamed our tools “Parallel Studio” to emphasize our support for parallelism. At that time, the world was changing as parallel programming was destined to be ubiquitous in the form of multicore processors. It started with dual-core processors supplanting single core processors. Today, core counts are in the dozens and still on an upward trend.

Just like parallel programming for homogeneous systems has become ubiquitous, we see parallel programming for heterogeneous systems on a similar path to being ubiquitous. Unlike multicore parallelism, heterogeneous programming will span compute capabilities from multiple vendors. This threatens to fragment programming unless we all come together to support open multivendor approaches in compilers, libraries, frameworks, and all tooling for software developers.

We named this next generation of our popular tools to emphasize the oneAPI open approach to heterogeneous parallelism. They remain the same product quality tools the industry has relied upon for decades, extended to support heterogeneous programming by embracing the oneAPI specification and SYCL standard. Download and start using the tools right away–at no cost! Community support is available at the Intel Community Forums. Intel continues to offer Priority Support to submit questions, problems, and other technical support issues.

C/C++ is ready now

We recommend that all new projects start with the LLVM-based Intel C/C++ compilers, and all existing projects should make a plan to migrate to the new compiler this year. At some point in the future, the classic C/C++ compilers will enter “Legacy Product Support” mode signaling the end of regular updates to the classic compiler base, and they will no longer appear in oneAPI toolkits.

CoreMark-Pro on Intel® Core i7-8700K ProcessorThe new LLVM-based Intel C/C++ has reached parity with the classic version, and the LLVM-based C/C++ offers the best optimization technology we have. We suggest all users should try the new C/C++ compiler now, enjoy the benefits, and provide feedback.

There is an excellent guide for converting from the classic C/C++ compiler to the LLVM-based compilers. The first thing you’ll notice is that the compiler has a different name (icx). This allows you to have both the classic and the new compilers installed and choose between them. Many users have already made the switch to rely solely on the LLVM-based Intel C/C++ compilers for their products going forward. The latest release notes offer more details on known issues and limitations (release notes for the classic C/C++ compilers are also available). Check out our webinars ("Talk to Experts") for opportunities to hear from experts live or via on-demand viewing of previously recorded sessions.

LLVM-based Intel Fortran compiler is a work in progress

Intel Fortran has long been known for extensive standards support and superior performance.  That tradition will continue with an LLVM-based Intel Fortran compiler once we complete our beta program. We appreciate feedback.

The LLVM-based Fortran compiler beta offers extensive support of Fortran, while some functionality remains a work-in-progress. You can review the status of specific features to see if it is ready for you: a release-by-release status for individual features can be found in our Fortran and OpenMP feature status table for the LLVM-based Fortran. Fortran compiler release notes can be found together for both the classic and beta (LLVM-based) compilers.

I’ll post a blog later this year, updating our adoption of LLVM for Fortran.

Excellent New Chapter for Intel Compilers

The Intel C/C++ and Fortran compilers products have a rich history that started with UNIX System V compilers in the early 1990s, added compiler technology from Multiflow in the mid-1990s, and we grew in the 2000s with the fabled DEC/Compaq Fortran team plus Kuck and Associates Inc. (KAI) OpenMP and parallelism expertise. As the Intel compilers enter their fourth decade, they continue their journey with LLVM compiler technology. Users of Intel compilers will continue to see strong standards support, reliable code optimization, and strong dedication to supporting your needs. All with the added mission of leading the way in supporting heterogeneous programming.

We continue to be committed to making Intel C/C++ and Fortran compilers important and useful tools in your quest to build world changing applications.

Learn More – "Talk to the Compiler Experts" Webinars

We offer live interactive sessions hosted by experts in Intel Compiler technologies. These "Talk to Expert" sessions are great to attend live because you can ask questions and get them answered on the spot. After we have our live session, the recording is available on demand (many prior sessions are available now!), and our community forums are a great place to ask questions whenever you have them.

Check "Talk to Experts" Sign-up to register for the detailed information on joining sessions and gettings notifications if anything changes.

Two key sessions I recommend are:

Get the Latest Intel Compilers, Now, for Free – Download Now

Users of the Intel compilers can now enjoy the best of both worlds, combining Intel’s decades of expertise in optimization for Intel architecture and OpenMP, with LLVM. 

Download today from the oneAPI toolkit website.

 

Please post comments with your thoughts, feedback, and suggestions on community.intel.com James-Reinders-Blog.

 

More Benchmarks and Configuration Details

Together, these benchmarks show that we've reached the sought after tipping point where the LLVM-based compiler fully ready to take on the role as the preferred compiler for all our users.

Faster Compile Times

Build Time Performance on Linux SPECrate 2017 Integer Suite 64 Bit (Estimated)   Build Time Performance on Linux SPECrate 2017 Integer Suite 32 Bit (Estimated)

The SPEC CPU 2017 benchmark package contains industry-standardized, CPU intensive suites for measuring and comparing compute intensive performance, stressing a system's processor, memory subsystem and compiler. More information on the SPEC benchmarks can be found at: https://www.spec.org.

Configuration:  Testing by Intel as of  Jun 10,2021. Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz, 16G x2 DDR4 2666.  Red Hat Enterprise Linux release 8.0 (Ootpa), 4.18.0-80.el8.x86_64. Software:  Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604_000000. Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604. Compiler switches:  Intel(R) 64 Compiler Classic: -O2 -xCORE-AVX512, Intel(R) oneAPI DPC++/C++ Compiler: -O2 -xCORE-AVX512

Optimized Performance

SPECrate 2017 (Estimated)

SPECrate 2017 FP (Estimated) Performance advantage relative to other compilers on Intel® Xeon Platinum 8380 Processor   SPECrate 2017 INT (Estimated) Performance advantage relative to other compilers on Intel® Xeon Platinum 8380 Processor

The SPEC CPU 2017 benchmark package contains industry-standardized, CPU intensive suites for measuring and comparing compute intensive performance, stressing a system's processor, memory subsystem and compiler. More information on the SPEC benchmarks can be found at: https://www.spec.org.

Configuration:  Testing by Intel as of  Jun 10,2021.  Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz, 2 socket, Hyper Thread on, Turbo on, 32G x16 DDR4 3200 (1DPC).  Red Hat Enterprise Linux release 8.2 (Ootpa), 4.18.0-193.el8.x86_64. Software:  Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604. Intel(R) C++ Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604_000000, GCC 11.1, Clang/LLVM 12.0.0. SPECint®_rate_base_2017 compiler switches: Intel(R) oneAPI DPC++/C++ Compiler: -xCORE-AVX512 -O3 -ffast-math -flto -mfpmath=sse -funroll-loops -qopt-mem-layout-trans=4. Intel(R) C++ Intel(R) 64 Compiler Classic: -xCORE-AVX512 -ipo -O3 -no-prec-div -qopt-mem-layout-trans=4 -qopt-multiple-gather-scatter-by-shuffles. GCC: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -mprefer-vector-width=128.  LLVM: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto. qkmalloc used for intel compiler. jemalloc 5.0.1 used for gcc and llvm. SPECfp®_rate_base_2017 compiler switches: Intel(R) oneAPI DPC++/C++ Compiler: -xCORE-AVX512 -Ofast -ffast-math -flto -mfpmath=sse -funroll-loops -qopt-mem-layout-trans=4. Intel(R) C++ Intel(R) 64 Compiler Classic: -xCORE-AVX512 -ipo -O3 -no-prec-div -qopt-prefetch -ffinite-math-only -qopt-multiple-gather-scatter-by-shuffles -qopt-mem-layout-trans=4. GCC: -march=skylake-avx512 -mfpmath=sse -Ofast -fno-associative-math -funroll-loops -flto. LLVM: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto. 

 

SPECspeed 2017 (Estimated)

SPECspeed 2017 FP (Estimated) Performance advantage relative to other compilers on Intel® Xeon Platinum 8380 Processor   SPECspeed 2017 INT (Estimated) Performance advantage relative to other compilers on Intel® Xeon Platinum 8380 Processor

The SPEC CPU 2017 benchmark package contains industry-standardized, CPU intensive suites for measuring and comparing compute intensive performance, stressing a system's processor, memory subsystem and compiler. More information on the SPEC benchmarks can be found at: https://www.spec.org.

Configuration: Testing by Intel as of  Jun 10, 2021.  Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz, 2 socket, Hyper Thread on, Turbo on, 32G x16 DDR4 3200 (1DPC).  Red Hat Enterprise Linux release 8.2 (Ootpa), 4.18.0-193.el8.x86_64. Software:  Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604. Intel(R) C++ Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604_000000, GCC 11.1, Clang/LLVM 12.0.0. SPECint®_speed_base_2017 compiler switches: Intel(R) oneAPI DPC++/C++ Compiler: -xCORE-AVX512 -O3 -ffast-math -flto -mfpmath=sse -funroll-loops -qopt-mem-layout-trans=4 -fiopenmp. Intel(R) C++ Intel(R) 64 Compiler Classic: -xCORE-AVX512 -ipo -O3 -no-prec-div -qopt-mem-layout-trans=4 -qopt-GCC: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto –fopenmp. LLVM: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -fopenmp=libomp.  multiple-gather-scatter-by-shuffles -qopenmp. jemalloc 5.0.1 used for intel compiler, gcc and llvm. SPECfp®_speed_base_2017 compiler switches: Intel(R) oneAPI DPC++/C++ Compiler: -xCORE-AVX512 -Ofast -ffast-math -flto -mfpmath=sse -funroll-loops -qopt-mem-layout-trans=4 -fiopenmp. Intel(R) C++ Intel(R) 64 Compiler Classic: -xCORE-AVX512 -ipo -O3 -no-prec-div -qopt-prefetch -ffinite-math-only -qopt-multiple-gather-scatter-by-shuffles -qopenmp. GCC: -march=skylake-avx512 -mfpmath=sse -Ofast -fno-associative-math -funroll-loops -flto –fopenmp. LLVM: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -fopenmp=libomp. jemalloc 5.0.1 used for intel compiler, gcc and llvm.

 

CoreMark-Pro on Intel® Core i7-8700K Processor

CoreMark-Pro on Intel® Core i7-8700K Processor   CoreMark-Pro on Intel® Core i7-8700K Processor

CoreMark-Pro aims to test the entire processor, with comprehensive support for multicore technology, a combination of integer and floating-point workloads, and data sets for utilizing larger memory subsystems. For more information on CoreMark-Pro from the Embedded Microprocessor Benchmark Consortium (EEMBC), see https://www.eembc.org/coremark-pro/.

In these benchmark results, both Intel compiler options are close but the numbers show that we have a little more work to do for the Intel LLVM-based compiler to beat the classic compiler. I hope you'll agree that this is close enough given the other outstanding results from our LLVM-based compilers.

Testing by Intel as of  Jun 10, 2021 - Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz, 16G x2 DDR4 2666. Software: Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604_000000, GCC 11.1, Clang/LLVM 12.0.0.   Red Hat Enterprise Linux release 8.0 (Ootpa), 4.18.0-80.el8.x86_64.  Compiler switches:   Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.1 Build 20201112_000000: icc -xCORE-AVX2 -mtune=skylake -ipo -O3 -no-prec-div -qopt-prefetch.   GCC 11.1: gcc -march=native -mfpmath=sse -Ofast -funroll-loops -flto.   LLVM 12.0.0: clang -Ofast -funroll-loops -flto -static -mfpmath=sse -march=native.

 

CoreMark-Pro on Intel® Atom C3850 Processor

CoreMark-Pro on Intel® Atom C3850 Processor   Performance Advantage Measured by CoreMark-Pro* on Intel® Atom C3850 Processor 

CoreMark-Pro aims to test the entire processor, with comprehensive support for multicore technology, a combination of integer and floating-point workloads, and data sets for utilizing larger memory subsystems. For more information on CoreMark-Pro from the Embedded Microprocessor Benchmark Consortium (EEMBC), see https://www.eembc.org/coremark-pro/.

In these benchmark results, both Intel compiler options are close but the numbers show that we have a little more work to do for the Intel LLVM-based compiler to beat the classic compiler in one case. I hope you'll agree that this is close enough given the other outstanding results from our LLVM-based compilers.

Configuration:  Testing by Intel as of  Jun 10, 2021 - Intel(R) Atom(TM) CPU C3850 @ 2.10GHz, 16G x2 DDR4 2400. Software: Intel(R) C Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.1 Build 20201112_000000, GCC 11.1, Clang/LLVM 12.0.0.  Red Hat Enterprise Linux release 8.0 (Ootpa), 4.18.0-80.el8.x86_64. Compiler switches:  Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.1 Build 20201112_000000: icc -xATOM_SSE4.2 -mtune=goldmont -ipo -O3 -no-prec-div -qopt-prefetch.  GCC 11.1: gcc -march=native -mfpmath=sse -Ofast -funroll-loops -flto.  LLVM 12.0.0: clang -Ofast -funroll-loops -flto -static -mfpmath=sse -march=native.

 

LORE:  Loop Repository for Evaluation of Compilers Benchmarks

LORE:  Loop Repository for Evaluation of Compilers Benchmarks   

LORE tests C language for loop nests extracted from popular benchmarks, libraries, and real applications. Loops cover a variety of properties that can be used by the compiler community to evaluate loop optimization. 65 Benchmarks & Workloads Tested. For more information, see https://www.vectorization.computer

Configuration: Testing by Intel as of  Jun 9, 2021 -Intel(R) Xeon(R) Platinum 8180CPU @ 2.50GHz, 2 socket, 28 cores, HT enabled, Turbo enabled, 384GB RAM. Software:  Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.2.0 Build 20210607, Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210607.  Ubuntu 18.04.1 with GCC 10.2.0. Compiler switches:  Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210607: ICC OPT - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -xHost -w".  ICC OPT512 - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -xHost -w -qopt-zmm-usage=high”. Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.2.0 Build 20210607:  ICX OPT - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -xHost -w". ICX OPT512 - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -xHost -w -mprefer-vector-width=512". ICX OPTm - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -march=skylake-avx512 -w". ICX OPT512m - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -march=skylake-avx512 -w -mprefer-vector-width=512.

 

RAJA Performance Suite (RAJAPerf)

RAJA Performance Suite is designed to explore performance of loop-based computational kernels found in HPC applications   

The RAJA Performance Suite is designed to explore performance of loop-based computational kernels found in HPC applications. Learn more about RAJA Performance Suite at https://github.com/LLNL/RAJAPerf.

You might note that this demanding benchmarks shows parity with our classic compiler, not an improvement. That's still a solid and impressive result.. I did not hesitate to include it, because I am showing the benchmarks we ran to ensure we reached this point where the new LLVM-based version is now worthy of fully recommending.

Configuration: Testing by Intel as of  Jun 9, 2021 -Intel(R) Xeon(R) Platinum 8180CPU @ 2.50GHz, 2 socket, 28 cores, HT enabled, Turbo enabled, 384GB RAM. Software:  Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.2.0 Build 20210607, Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210607.  Ubuntu 18.04.1 with GCC 10.2.0. Compiler switches:   Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210607: ICC OPT        OPT="-Ofast -ansi-alias -xCORE-AVX512", ICC OPT512 OPT="-Ofast -ansi-alias -xCORE-AVX512 -qopt-zmm-usage=high", setenv KMP_AFFINITY compact,granularity=fine. Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.2.0 Build 20210607:  ICX OPT        OPT="-Ofast -ansi-alias -xCORE-AVX512", ICX OPT512 OPT="-Ofast -ansi-alias -xCORE-AVX512 -qopt-zmm-usage=high", setenv KMP_AFFINITY compact,granularity=fine

 

Performance varies by use, configuration, and other factors. Learn more at www.intel.com/PerformanceIndex.  Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See configuration disclosure for details. No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software, or service activation.

 

 
 
 
 
 


 

产品和性能信息

1

性能因用途、配置和其他因素而异。请访问 www.Intel.cn/PerformanceIndex 了解更多信息。