The Parallel Universe Magazine


Issue 37

Leadership Performance with 2nd Generation Intel® Xeon® Scalable Processors: New Features & Tools to Maximize Your HPC, AI, and Analytics Applications

Meet the new 2nd generation Intel® Xeon® Scalable processor family that’s already set 95 performance world records with new features including Intel® Deep Learning Boost for AI deep learning inference acceleration and support for Intel® Optane™ DC persistent memory for data centers.

Read This Issue

38 Search Results

Parallel Universe Magazine - Issue 25, June 2016


  • Letter from the Editor: Democratization of HPC, by James Reiders
    James Reinders, an expert on parallel programming, is coauthor of the new Intel® Xeon Phi™ Processor High Performance Programming—Knights Landing Edition.
  • Supercharging Python* with Intel and Anaconda* for Open Data Science, by Travis Oliphant
    The technologies that promise to tackle Big Data challenges.
  • Getting Your Python* Code to Run Faster Using Intel® VTune™ Amplifier XE, by Kevin O’Leary
    Providing line-level profiling information with very low overhead.
  • Parallel Programming with Intel® MPI Library in Python*, by Artem Ryabov and Alexey Malhanov
    Guidelines and tools for improving performance.
  • The Other Side of the Chip, by Robert Ioffe
    Using Intel® Processor Graphics for Compute with OpenCL™.
  • A Runtime-Generated Fast Fourier Transform for Intel® Processor Graphics, by Dan Petre, Adam T. Lake, and Allen Hux
    Optimizing FFT without increasing complexity.
  • Indirect Calls and Virtual Functions Calls: Vectorization with Intel® C/C++ 17.0 Compilers, by Hideki Saito, Serge Preis, Sergey Kozhukhov, Xinmin Tian, Clark Nelson, Jennifer Yu, Sergey Maslov, and Udit Patidar
    The newest Intel® C++ Compiler introduces support for indirectly calling a SIMD-enabled function in a vectorized fashion.
  • Optimizing an Illegal Image Filter System, by Yueqiang Lu, Ying Hu, and Huaqiang Wang
    Tencent doubles the speed of its illegal image filter system using a SIMD instruction set and Intel® Integrated Performance Primitives.

Parallel Universe Magazine - Special Issue, June 2016


  • Letter from the Editor: From Hatching to Soaring: Intel® TBB, by James Reinders
    James Reinders, an expert on parallel programming, is coauthor of the new Intel® Xeon Phi™ Processor High Performance Programming – Knights Landing Edition (June 2016), and coeditor of the recent High Performance Parallel Programming Pearls Volumes One and Two (2014 and 2015).
  • The Genesis and Evolution of Intel® Threading Building Blocks, by Arch D. Robison
    A decade after the introduction of Intel Threading Building Blocks, the original architect shares his perspective.
  • A Tale of Two High-Performance Libraries, by Vipin Kumar E.K.
    How Intel® Math Kernel Library and Intel® Threading Building Blocks work together to improve performance.
  • Heterogeneous Programming with Intel® Threading Building Blocks, by Alexei Katranov, Oleg Loginov, and Michael Voss
    With new features, Intel® Threading Building Blocks can coordinate the execution of computations across multiple devices.
  • Preparing for a Many-Core Future, by Kevin O’Leary, Ben Langmead, John O’Neill, and Alexey Kukanov
    Johns Hopkins University adds multicore parallelism to increase performance of its Bowtie 2* application.
  • Leading and Following the C++ Standard, by Alexei Katranov
    Intel® Threading Building Blocks adheres tightly to the C++ standard where it can—and paves the way for supporting parallelism best.
  • Intel® Threading Building Blocks: Toward the Future, by Alexey Kukanov
    The architect of Intel® Threading Building Blocks shares thoughts on the opportunities ahead.

Parallel Universe Magazine - Issue 24, March 2016


  • Letter from the Editor, James Reinders
    Time-Saving Tips as Spring Begins in the Northern Hemisphere
  • Improve Productivity and Boost C++ Performance
    The new Intel® SIMD Data Layout Template library optimizes C++ code and helps improve SIMD efficiency.
  • Intel® C++ Compiler Standard Edition for Embedded Systems with Bi-Endian Technology
    Intel® C++ Compiler Standard Edition for Embedded Systems with Bi-Endian Technology helps developers looking to overcome platform lock-in.
  • OpenMP* API Version 4.5: A Standard Evolves
    OpenMP* version 4.5 is the next step in the standard’s evolution, introducing new concepts for parallel programming as well as additional features for offload programming.
  • Intel® MPI Library: Supporting the Hadoop* Ecosystem
    With data analytics breaking into the HPC world, the question of using MPI and big data frameworks in the same ecosystem is getting more attention.
  • Finding Your Memory Access Performance Bottlenecks
    The new Intel® VTune™ Amplifier XE Memory Access analysis feature shows how some tough memory problems can be resolved.
  • Optimizing Image Identification with Intel® Integrated Performance Primitives
    Intel worked closely with engineers at China’s largest and most-used Internet service portal to help them achieve a 100 percent performance improvement on the Intel® architecture-based platform.
  • Develop Smarter Using the Latest IoT and Embedded Technology
    A closer look at tools for coding, analysis, and debugging with all Intel® microcontrollers, Internet of Things (IoT) devices, and embedded platforms.
  • Tuning Hybrid Applications with Intel® Cluster Tools
    This article provides a step-by-step workflow for hybrid application analysis and tuning.
  • Vectorize Your Code Using Intel® Advisor XE 2016
    Vectorization Advisor boasts new features that can assist with vectorization on the next generation of Intel® Xeon Phi™ processors.

Parallel Universe Magazine - Issue 23, November 2015


  • Letter from the Editor, by James Reinders
    Computers “Think” More Like Humans, but They Still Need Us
  • Which Tool Do I Use? A Roadmap to Increasing Your Application’s Performance
    By using the correct tool at each phase of your performance tuning, you can greatly increase performance at lower cost.
  • Modernizing Code for Tomorrow’s HPC Problem-Solving
    Tips on code modernization, or increasing parallel programming, that have proven valuable for dedicated HPC software developers, domain specialists, and data scientists alike.
  • Get a Helping Hand from the Vectorization Advisor
    With Vectorization Advisor recommendations, the Hartree Centre was able to get an 18 percent speedup in their code
  • Optimizing Image Processing
    As China’s largest online direct sales company, handles several billion product images every day. By using Intel® software development tools, sped up its image processing 17x—handling 300,000 images in 162 seconds.
  • Boosting Speech Recognition Performance
    Qihoo360 Technology Co., Ltd., a Chinese Internet security company, collaborated with Intel to optimize its Euler* platform, which supports machine learning-related computation models for real businesses.
  • How Fortran Developers Can Boost Productivity with Submodules
    Submodules are now supported in Intel® Fortran Compiler Version 16.0.

Parallel Universe Magazine - Issue 22, September 2015


  • Letter from the Editor, by James Reinders
    Vectorize and Live
  • Putting Vector Programming to Work with OpenMP* SIMD for Intel® Xeon® Processors, Intel® Xeon Phi™ Coprocessors, and Intel® GPUs
    This article describes the C/C++/Fortran SIMD extensions for explicit vector programming available in the OpenMP* 4.0 specification. We explain the semantics of SIMD constructs and clauses with simple examples. In addition, explicit vector programming guidelines and programming examples are provided in Sections 3 and 4 to help programmers write efficient SIMD programs.
  • Vectorization Advisor: A New Tool for Vectorization Advice
    In this article, we demonstrate how to use Intel® Advisor XE on real code examples to optimize vector codes. Intel Advisor XE combines dynamic analysis, static binary analysis, and compiler reports with actionable recommendations for fixing performance bottlenecks.

Parallel Universe Magazine - Issue 21, May 2015


  • Letter from the Editor, by James Reinders
    Happy Birthday, MPI
  • An Introduction to MPI-3 Shared Memory Programming:
    An All-MPI Alternative to MPI/OpenMP* Programming Worth Considering

    The MPI-3 standard introduces another approach to hybrid parallel programming: the new MPI Shared Memory (SHM) model, which enables incremental changes to existing MPI codes in order to accelerate communication between processes on shared-memory nodes.
  • Intel® MPI Library Conditional Reproducibility
    The Intel® MPI Library uses algorithms that guarantee deterministic reductions for different collective MPI operations. The authors demonstrate the impact of such algorithms using a simple example moving from a repeatable to a conditionally reproducible outcome, without the need to modify the application’s source code.
  • Intel® MPI Memory Consumption
    Memory consumption analysis is a complex task. This article discusses the estimated memory consumption for the Intel® MPI Library and helps users fine-tune library settings for a reduced memory footprint

Parallel Universe Magazine - Issue 20, February 2015


  • Letter from the Editor, by James Reinders
  • Your Path to Knights Landing, the Next Generation of Intel® Xeon Phi™ technology
    Prepare your application now for Knights Landing—the highly scalable, next-generation Intel® Xeon Phi™ processor/coprocessor that debuts this year
  • OpenMP* Region Analysis with Intel® VTune™ Amplifer XE
    Intel® VTune™ Amplifer XE can help OpenMP* users more easily understand where to invest their tuning efforts.
  • Walker Molecular Dynamics Laboratory Optimizes Biomedical Software
    Walker Molecular Dynamics Laboratory turned to Intel® VTune™ Amplifer and other Intel® Software Development Products to optimize performance on both Intel® Xeon™ and Xeon Phi™ architecture.
  • Intel® Software Development Products Win HPCwire Awards
    Two Intel® Software Development Products garner top honors from readers and editors alike.
  • Real-World Pearls of Wisdom in High Performance Parallelism
    Leading experts from numerous industries and disciplines whip up delicious code in this “cookbook” of programming for better parallel performance.

Parallel Universe Magazine - Issue 19, September 2014


  • Letter from the Editor, by James Reinders
  • Optimization Reports: Increase Performance with Intel® Compilers
    Compiler optimization reports available in Intel® Parallel Studio XE 2015 can be used to tune code and increase performance. This article covers the types of report data available and how to apply this insight to your applications.
  • How to Design for Scalable Performance—from Multicore to Many-core
    Intel® Advisor XE 2015 creates a framework for software architects to model their design and predict performance scaling and synchronization issues. Here we see how Intel Advisor XE 2015 extends modeling capabilities to support Intel® Xeon Phi™ coprocessors.
  • Additional Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
    A concise look at new instructions that enrich the operations available with Intel AVX-512. These include a group of byte and word (8- and 16-bit) operations known as Byte and Word Instructions, which enhance integer operations. An additional orthogonal capability, known as Vector Length Extensions, enables most AVX-512 instructions to operate on 128 or 256 bits.
  • Digimarc Takes Embedded Digital Watermarking to the Next Level
    A case study reveals how Digimarc optimized code and used vectorization to upgrade its SDKs. The results include meeting aggressive performance and time-to-market goals, while continuing to make its mark in digital watermark innovation.
  • High Performance Parallelism Pearls
    The latest book compiled by parallel programming evangelists and expert Intel engineers James Reinders and Jim Jeffers distills the experience of 69 experts into a 28-chapter “cookbook” on the inventive ways to get the most from Intel® multicore and many-core processors.

Parallel Universe Magazine - Issue 18, June 2014


  • Letter from the Editor, by James Reinders
    Speaking in Code
  • Graduate from MIT to GCC Mainline with Intel® Cilk™ Plus
    Intel® Cilk™ Plus provides a higher level of abstraction than other threading frameworks. This article explores its capabilities for expressing task and data parallelism.
  • Flow Graphs, Speculative Locks, and Task Arenas in Intel® Threading Building Blocks
    A look at Intel® TBB features, including the flow graph interface, speculative locks that take advantage of the Intel® Transactional Synchronization Extensions (Intel® TSX) technology, and user-managed task arenas that provide enhanced concurrency control and work isolation.
  • 20 Years of the MPI Standard: Now with a Common Application Binary Interface
    Examines MPI compatibility issues and resolution, as well as the potential of the upcoming common MPI Application Binary Interface (ABI).
  • Mastering Performance Challenges with the New MPI-3 Standard
    Find out how to measure the overlap of communication and computation, and how an MPI application can benefit from nonblocking collective communication.
  • An OpenMP* Timeline
    An infographic time-capsule of OpenMP*.
  • Leverage Your OpenCL™ Investment on Intel® Architectures
    Get more out of OpenCL™—from cross-device portability to the Intel® Xeon Phi™ coprocessor.

Parallel Universe Magazine - Issue 17, March 2014


  • Letter from the Editor, by James Reinders
    Performance Master Class
  • Turbocharging Open Source Python, R, and Julia-based HPC Applications, by Vipin Kumar E K
    With a few simple steps, Python, R, and Julia can be built and installed with Intel® Math Kernel Library (Intel® MKL) support using Intel® compilers for out-of-the-box performance improvements.
  • Multithreading and Task Analysis with Intel® VTune™ Amplifer XE 2013, by Vladimir Tsymbal
    Save significant time and effort spent designing and supporting parallel algorithms, such as pipeline, by improving parallelism through thread management.
  • Speed Threading Performance: Enabling Intel® TSX using Intel® C++ Compiler, by Anoop Madhusoodhanan Prabha
    Intel® Transactional Synchronization Extensions (Intel® TSX) can be used to exploit the inherent concurrency of a program by allowing concurrent execution of a critical section.
  • Tachyon Ray Tracer: Porting on the Intel® Xeon Phi™ Coprocessor, by Roman Lygin and Dmitry Durnov
    Learn how to port Tachyon—an open source ray tracer and part of the Spec MPI* suite—to the Intel® Xeon Phi™ coprocessor or Intel® Xeon® processor.
  • Building Native Android* Apps With Intel® C++ Compiler, by Anoop Madhusoodhanan Prabha
    What can the Intel® C++ Compiler do for your Android* applications? Take a quick look at key features and benefits.

Parallel Universe Magazine - Issue 16, November 2013


  • Letter from the Editor: Performance Hits the Streets, by James Reinders
  • Introducing Intel® Cluster Studio XE 2013 SP1, by James Tullos
    Provides a quick reference to the newest features of this HPC software development tool suite. These include the latest improvements to the Intel® MPI Library and the Intel® Trace Analyzer and Collector to help distributed memory programs run faster and more effectively.
  • Full throttle: OpenMP* 4.0, by Michael Klemm and Christian Terboven
    OpenMP takes a quantum leap with new features supporting OpenMP tasks, SIMD instructions, and the effective integration of application code, third-party libraries, and hardware to achieve a highly efficient solution.
  • Profiling MPI Communications—Techniques for High Performance, by James Tullos
    Focuses on Intel® Trace Analyzer and Collector (ITAC), a performance analysis tool which is part of Intel® Cluster Studio XE SP1. ITAC provides the ability to profile and analyze MPI applications to find areas for performance improvement.
  • Pexip Speeds Videoconferencing with Intel® Parallel Studio XE, by Stephen Blair-Chappell
    See how Pexip has been able to match, and even exceed, the performance of traditional conferencing systems— designing a cost-effective alternative with expanded processing capabilities.

Parallel Universe Magazine - Issue 15, August 2013


  • Letter from the EditorResults Matter, by James Reinders
  • Efficient Software Development: 4 What’s New in Intel® Parallel Studio XE 2013 Service Pack 1, by Kirill Rogozhin
    Explores new capabilities to help efficiently program for coprocessors, create powerful parallel frameworks, and find and fix complex performance issues on the latest hardware.
  • Coprocessor Debugging Support in Intel® Parallel Studio XE, by Keven Boell
    Examines debug solutions for native applications and offload programs running partially or completely on the Intel® Xeon Phi™ coprocessor. Intel Parallel Studio XE tools provide a single source-line view of the program’s control flow and variables on the command line, as well as in Eclipse* and Microsoft Visual Studio*.
  • Full Scale Ahead: The Weather Research and Forecast (WRF) 29 Model and Intel® Cluster Studio XE 2013, by Mark Lubin, Scott McMillan, Christopher G. Kruse, Davide Del Vento, and Raffaele Montuoro
    The WRF Model is a next-generation, mesoscale weather prediction system that is widely used across a wide range of meteorological applications. The authors demonstrate WRF scalability on “commodity” supercomputers using Intel® Cluster Studio XE 2013 software tools, including Intel® compilers and the Intel® MPI Library.


Get The Latest Issue

Intel’s quarterly magazine helps you take your software development into the future with the latest tools, tips, and training to expand your expertise.


The benchmark results reported above may need to be revised as additional testing is conducted. The results depend on the specific platform configurations and workloads utilized in the testing, and may not be applicable to any particular user’s components, computer system, or workloads. The results are not necessarily representative of other benchmarks and other benchmark results may show greater or lesser impact from mitigations.

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, see Performance Benchmark Test Disclosure.