Intel® Parallel Studio XE 2016, launched on August 25, 2015, is the latest installment in Intel's developer toolkit for high performance computing (HPC) and technical computing applications. This suite of compilers, libraries, debugging facilities, and analysis tools, targets Intel® architecture, including support for the latest Intel® Xeon® processors (codenamed Skylake) and Intel® Xeon Phi™ processors (codenamed Knights Landing). Intel® Parallel Studio XE 2016 helps software developers design, build, verify and tune code in Fortran, C++, C, and Java.
There are four things that I like to highlight when I describe this year's tool release:
- Intel® Data Analytics Acceleration Library
- Vectorization Advisor
- MPI Performance Snapshot
- High performance support for industry standards, the latest processors, operating systems and their related development environments.
Intel Data Analytics Acceleration Library (Intel® DAAL)
Data Scientists are finding Intel® DAAL very exciting because it helps speed big data analytics. It’s designed for use with popular data platforms including Hadoop*, Spark*, R, and Matlab*, for highly efficient data access. We’ve seen Intel DAAL accelerate PCA by 4-7X ,and a customer that has seen 200X for the Alternating Least Square prediction algorithm, when compared with the latest open source Spark + MLlib. (details for both claims are in my blog about DAAL). Intel DAAL was created by the renowned team that creates the Intel® Math Kernel Library (Intel® MKL). Intel DAAL can be thought of as “Intel MKL for Big Data” – but it is actually much more! Many more details on Intel DAAL, including ways to download it today for free are in my blog about DAAL. Intel DAAL is available for Linux*, OS X* and Windows*.
Vectorization is the process of using SIMD instructions in processors. In the quest to “modernize” applications to get top performance out of any modern processor, a software developer needs to tackle multithreading, vectorization and fabric scaling. Intel® Advisor XE 2016 provides tools to help with multithreading and vectorization:
- Vectorization Advisor is an analysis tool that helps identify loops that will benefit the most from vectorization by identifying obstacles to vectorization that are particular to your program, explore the benefit of alternative data organization, and increase the confidence that transformations, aimed to increase vectorization, will preserve the correctness of your original program.
- Threading Advisor is a threading design and prototyping tool that lets you analyze, design, tune, and check threading design options rapidly.
Threading Advisor has gained a reputation in the past five years for helping find the right choice for multithreading an application more quickly and without costly oversights. The experience of refining this advisor has helped Intel to create this new advisor for vectorization with knowledge, based on customer feedback, of the best ways to give advice based on a program analysis.
Vector Advisor cannot tell you anything I could not show you how to do yourself. However, when I teach ‘vectorization’ I tend to rattle off a list of things to check. Each item that I suggest to “check” involves using a tool in a particular way. Bringing that into one tool makes life easier and definitely makes the process faster and more efficient. One of the key Vectorization Advisor features is a Survey Report that offers integrated compiler report data and performance data all in one place, including GUI-embedded advice on how to fix vectorization issues specific to your code. This page augments that GUI-embedded advice with links to web-based vectorization resources.
An excellent 12 minute introduction to the Vectorization Advisor is available as a video online.
MPI Performance Snapshot
The MPI Performance Snapshot is a scalable lightweight performance tool for MPI applications. It collects a variety of MPI application statistics (such as communication, activity, and load balance) and presents it in an easy-to-read format. The tool is not available separately but is provided as part of the Intel® Parallel Studio XE 2016 Cluster Edition.
The MPI Performance Snapshot helps deal with the following problems as it relates to analysis of MPI application when scaling out to thousands of ranks:
- The size of clusters continue to grow so applications are getting more and more scalable.
- Large amounts of data are collected when doing profiling at larger scale which in turn can easily become unmanageable.
- It's hard to identify which are the key metrics to track when you gather so large amounts of data.
By addressing these three items, MPI Performance Snapshot improves scaling to at least 32K ranks which is an order of magnitude above what is tolerable with the prior Intel Trace Analyzer and Collector. Therefore, I can now recommend when aiming to optimize a large scale run (anything above one thousand MPI ranks), I suggest starting with the MPI Performance Snapshot capability first to figure out where you need to dig deeper (which processes are slowing the application down, where are the peaks in memory usage, etc.). Then, do another run with the Intel Trace Analyzer and Collector on a subset of selected ranks to get a more detailed per-process information in order to visualize how a communication algorithm is implemented and if see if there are apparent bottlenecks.
MPI Performance Snapshot combines lightweight statistics from the Intel® MPI Library with OS and hardware-level counters to provide you with high-level categorization of your application: MPI vs. OpenMP load imbalance info, memory usage, and a break-down of MPI vs. computation vs. serial time.
For more details, you should check out the full MPI Performance Snapshot User's Guide and Analyzing MPI Applications with MPI Performance Snapshot on the Intel Trace Analyzer and Collector documentation page.
High performance support for…
The latest processors...
are supported including support for the Skylake microarchitecture and Knights Landing microarchitecture.
The latest industry standards...
Intel takes pride in having very strong support for industry standards – aiming to be a leader and to maintain a reputation of being second-to-none.
Intel's Fortran support even includes a feature from the draft Fortran 2015 standard which can help MPI-3 users. The current status of features of Fortran can be found in Dr. Fortran’s blog “Intel® Fortran Compiler - Support for Fortran language standards.”
Operating system support includes Debian* 7.0, 8.0; Fedora* 21, 22; Red Hat* Enterprise Linux 5, 6, 7; SuSE* Linux Enterprise Server 11,12; Ubuntu* 12.04 LTS (64-bit only), 13.10, 14.04 LTS, 15.04; OS X 10.10; Windows 7 thru 10, Windows Server 2008-2012. These are just the versions Intel has tested, many additional operating systems should work (for instance, CentOS).
There is a series of webinars being held starting in September 2015 which cover many topics related to Intel Parallel Studio XE 2016. The webinars can be attended live, and offer interactive question and answer time. The webinars will also be available for replay after the live webinar is held. The first webinar is on September 1 – “What’s New in Intel® Parallel Studio XE 2016?”
Many more ways to learn more are on the Intel® Parallel Studio XE 2016 website. A number of benchmarks illustrating performance measurements are online as well.
There are many new features that I did not dive into, including great new support for MPI+OpenMP tuning with Intel VTune Amplifier XE, as well as a number of enhancements to Intel® Threading Building Blocks including the incresingly popular flow graph capabilities and task arenas,
Download Intel® Parallel Studio XE 2016 today
The Intel Performance Libraries are also available via the Community Licensing for Intel® Performance Libraries. Under this option, the library is free for any one who registers, with no royalties, and no restrictions on company or project size. The community licensing program offers the current versions of libraries without Online Service Center access (the Online Service Center offers exclusive 1-on-1 support via an interactive and secure web site where you can submit questions or problems and monitor previously submitted issues. It requires registration after purchase of the software, or special qualification offered to students, educators, academic researchers and open source contributors.).