GPU-Quicksort: How to Move from the OpenCL™ Platform to Data Parallel C++
Everyone is talking about Data Parallel C++ (DPC++), the heterogeneous, portable programming language based on the Khronos SYCL* standard. Learn how to use it to simplify programming and improve efficiency and innovation and implement GPU-Quicksort for multiple data types.
Contents: Letter from the Editor: The Parallel Universe Turns 10 by Henry A. Gabb, Senior Principal Engineer, Intel Corporation GPU-Quicksort: How to Move from OpenCL™ to Data Parallel C++ by Robert Ioffe, Senior Exascale Performance Software Engineer, Intel Corporation Optimizing the Performance of oneAPI Applications: Getting the Most from this Unified, Standards-Based Programming Model by Kevin O’Leary, Software Technical Consulting Engineer, Intel Corporation Speeding Up Monte Carlo Simulation with Intel® oneMKL: Intel® oneAPI Math Kernel Library (Beta) Data Parallel C++ Usage Models by Alina Elizarova and Pavel Dyakov, Math Algorithm Engineers, and Gennady Fedorov, Software Technical Consulting Engineer, Intel Corporation Bringing Accelerated Analytics at Scale to Intel® Architecture: Unifying Data Science with Traditional Analytics on Modern Hardware by Venkat Krishnamurthy, Product Vice President, and Kathryn Vandiver, Senior Director, Platform and Core Engineering, OmniSci A New Approach to Parallel Computing Using Automatic Differentiation: Getting Top Performance on Modern Multicore Systems by Dmitri Goloubentsev, Head of Automatic Adjoint Differentiation, Matlogica, and Evgeny Lakshtanov, Principal Researcher, Department of Mathematics, University of Aveiro, Portugal and Matlogica LTD 8 Rules for Parallel Programming for Multicore: There are Some Consistent Rules that can Help you Solve the Parallelism Challenge and Tap Into the Potential of Multicore by James Reinders, Founding Editor and Editor Emeritus of The Parallel Universe Book Review: The OpenMP Common Core Making OpenMP Simple Again by Ruud van der Pas, Senior Principal Software Engineer, Oracle Corporation
January 10, 2020
Contents: Letter from the Editor: Happy New Year and Welcome to the Era of oneAPI by Henry A. Gabb, Senior Principal Engineer, Intel Corporation Heterogeneous Programming Using oneAPI: How to Deliver Uncompromised Performance for Diverse Workloads Across Multiple Architectures by Nitya Hariharan, Application Engineer; Rama Kishan Malladi, Performance Modeling Engineer; Amarpal S. Kapoor, Technical Consulting Engineer; Kevin P O’Leary, Technical Consulting Engineer; Intel Corporation Accelerating Compression on Intel® FPGAs: How oneAPI is Making FPGAs More Accessible than Ever by Andrei Hagiescu, FPGA Software Engineer, and David Cashman, FPGA Software Engineer, Intel Corporation Is Your Game GPU-Bound? It’s Easy to Find Out with a GPU and Device Context Queue Analysis by Oleg Fedyaev, Graphics Software Engineer, Intel Corporation New Threading Capabilities in Julia* v1.3: Unleashing the Full Power of Modern CPUs by Jameson Nash and Jeff Bezanson, Julia Computing, Inc., and Kiran Pamnany, Caltech Fast Gradient Boosting Tree Inference for Intel® Xeon® Processors: How to Boost Prediction Quality and Performance Using GBT in Intel® Data Analytics Acceleration Library by Kirill Shvets, Machine Learning Engineer, and Egor Smirnov, Software Engineering Manager, Intel Corporation K-means Acceleration with 2nd Generation Intel® Xeon® Scalable Processors: Intel’s Hardware and Software Stack for Big Data by Alexander Andreev, Machine Learning Engineer, and Egor Smirnov, Software Engineering Manager, Intel Corporation Measuring Graph Analytics Performance: What Is Graph Analytics? And Why Does It Matter? by Henry A. Gabb, Senior Principal Engineer, Intel Corporation, and Editor, The Parallel Universe
October 4, 2019
Contents: Letter from the Editor: See You at the Intel® HPC Developer Conference by Henry A. Gabb, Senior Principal Engineer, Intel Corporation Accelerating XGBoost* for Intel® Xeon® Processors: How to Maximize Processor Performance for Machine Learning by Egor Smirnov, Software Engineering Manager, Intel Corporation Detecting and Mitigating False Sharing in Multi-Processors: Get Big Performance Benefits for Your Multithreaded Applications by Ramesh Peri, Senior Principal Engineer, Intel Corporation Speeding Up Simulation Analysis with yt* and Intel® Distribution for Python*: How to Boost Speed up to 4.6x on Intel® Xeon® Scalable Processors by Salvatore Cielo, PhD, Scientific Computing Expert, Leibniz Supercomputing Centre; Luigi Iapichino, PhD, Scientific Computing Expert, Leibniz Supercomputing Centre; Fabio Baruffa, PhD, Technical Consulting Engineer, Intel Corporation Intel® Software Guard Extensions: Using Hardware-Based Isolation and Memory Encryption to Provide More Code Protection in your Solutions by Rama Kishan Malladi, Performance Modeling Engineer, Intel Corporation Verizon Maximizes Customer Satisfaction: Optimizing Application Performance with Powerful Profiling, Guest Editorial by Dennis O’Connell, Senior Director of Performance Engineering, Verizon Composable Threading Is Coming to Julia*: Flexible Parallelism in a Productivity Language Editorial by Henry A Gabb, Senior Principal Engineer, Intel Corporation
July 11, 2019
Contents: Letter from the Editor: Black Holes and High-Performance Computing by Henry A. Gabb, Senior Principal Engineer, Intel Corporation Leadership Performance with 2nd-Generation Intel® Xeon® Scalable Processors: New Features and Tools to Maximize Your HPC, AI, and Analytics Applications by Amarpal S. Kapoor, Technical Consulting Engineer; Rama Kishan V. Malladi, Performance Modeling Engineer; and Avinash Karani and Nitya Hariharan, Application Engineers; Intel Corporation Using the Latest Performance Analysis Tools to Prepare for Intel® Optane™ DC Persistent Memory: Getting Past Bottlenecks and Storage Issues by Jackson Marusarz, Technical Consulting Engineer, and Kevin O’Leary, Senior Technical Consulting Engineer, Intel Corporation Measuring the Impact of NUMA Migrations on Performance: Weighing the Tradeoﬀs to Maximize Performance by Gurbinder Gill, Graduate Research Assistant, University of Texas at Austin, and Ramesh V. Peri, Senior Principal Engineer, Intel Corporation Parallelism in Python: Directing Vectorization with NumExpr*: Boosting Performance for Computing with Arrays and Numerical Expressions by Fabio Baruﬀa, PhD, Technical Consulting Engineer, Intel Corporation Turbo-Charged Open Shading Language on Intel® Xeon® Processors with Intel® Advanced Vector Extensions 512: Up to 2x Faster Full Renders Speed Digital Content Creation by Steena Monteiro, Software Engineer, and Alex M. Wells, Principal Engineer, Intel Corporation The Performance Optimization and Productivity (PoP) Project: Pursuing the Never-Ending Quest for Performance by Mike Croucher, Developer Advocate, Numerical Algorithms Group (NAG) Seven Ways HPC Software Developers Can Benefit from Intel® Software Investments: Taking Another Look at Intel and HPC Software by James Reinders, Editor Emeritus, The Parallel Universe
April 9, 2019
Contents: Letter from the Editor: Onward to Exascale Henry A. Gabb, Senior Principal Engineer, Intel Corporation Effectively Train and Execute Machine Learning and Deep Learning Projects on CPUs Nathan Greeneltch and Jing Xu, Software Technical Consulting Engineers, Intel Corporation Parallelism in Python* Using Numba* David Liu, Software Technical Consulting Engineer, Intel Corporation Boosting the Performance of Graph Analytics Workloads Stijn Eyerman, Wim Heirman, and Kristof Du Bois, Research Scientists, and Joshua B. Fryman and Ibrahim Hur, Principal Engineers, Intel Corporation How Effective is Your Vectorization? Kevin O’Leary, Technical Consulting Engineer, Intel Corporation Improving Performance using Vectorization for Particle-in-Cell Codes Bei Wang, HPC Software Engineer, Princeton University; Carlos Rosales-Fernandez, Software Technical Consulting Engineer, Intel Corporation; and William Tang, Professor, Princeton Plasma Physics Laboratory Boost Performance for Hybrid Applications with Multiple Endpoints in Intel® MPI Library Rama Kishan Malladi, Graphics Performance Modeling Engineer, and Dr. Amarpal Singh Kapoor, Technical Consulting Engineer, Intel Corporation Innovate System and IoT Apps Ramya Chandrasekaran and Thorsten Moeller, Product Marketing Engineers, Intel Corporation
January 7, 2019
Contents: Letter from the Editor: Happy New Year...and May 2019 Bring You High Performance by Henry A. Gabb, Senior Principal Engineer, Intel Corporation Intel® Rendering Framework Using Software-Defined Visualization by Rob Farber, Global Technology Consultant, TechEnablement Why Intel® Xeon® processors excel at visualization Unifying AI, Analytics, and HPC on a Single Cluster by Allene Bhasker and Keith Mannthey, Solution Architects, Data Center Group, Intel Corporation Maximizing efficiency and lowering costs for tomorrow's enterprise Advancing OpenCL™ for FPGAs by Martin C. Herbordt, Professor, Department of Electrical and Computer Engineering, Boston University Boosting performance with Intel® FPGA SDK for OpenCL™ software technology Parallelism in Python* by David Liu, Software Technical Consulting Engineer, and Anton Malakhov, Software Development Engineer, Intel Corporation Dispelling the myths with tools to achieve parallelism Remove Memory Bottlenecks Using Intel® Advisor by Kevin O’Leary and Alex Shinsel, Technical Consulting Engineers, Intel Corporation Understanding how your program is accessing memory helps you get more from your hardware MPI-3 Non-Blocking I/O Collectives in Intel® MPI Library by Nitya Hariharan, Amarpal Singh Kapoor, and Rama Kishan Malladi, Technical Marketing Engineers, Core and Visual Computing Group, Intel Corporation; Md Vasimuddin, Research Scientist, Parallel Computing Lab, Intel Labs Speeding up I/O for HPC applications
March 22, 2018
Contents: Letter from the Editor: Computer Vision Coming Soon to a Browser Near You by Henry A. Gabb Computer Vision for the Masses by Sajjad Taheri, Alexeandru Nicolau, Alexeander Vedienbaum, Ningxin Hu, and Mohammad Reza Haghighat Bringing computer vision to the Open Web Platform*. Up Your Game by Giselle Gomez How to optimize your game development―no matter what your role―using Intel® Graphics Performance Analyzers. Harp-DAAL for High-Performance Big Data Computing by Judy Qiu The key to simultaneously boosting productivity and performance. Understanding the Instruction Pipeline by Alex Shinsel The key to adaptability in modern application programming, Parallel CFD with the HiFUN* Solver on the Intel® Xeon® Scalable Processor by Rama Kishan Malladi, S.V. Vinutha, and Austin Cherian Maximizing HPC platforms for fast numerical simulations. Improving VASP* Materials Simulation Performance by Fedor Vasilev, Dmitry Sivkov, and Jeongnim Kim Using the latest Intel® Software Development Tools to make more efficient use of hardware.
October 1, 2017
Contents: Letter from the Editor: Meet Intel® Parallel Studio XE 2018, by Henry A. Gabb Henry A. Gabb is a long-time high-performance and parallel computing practitioner and has published numerous articles on parallel programming. Driving Code Performance with Intel® Advisor’s Flow Graph Analyzer, by Vasanth Tovinkere, Pablo Reble, Farshad Akhbari, and Palanivel Guruvareddiar Optimizing performance for an autonomous driving application. Welcome to the Adult World, OpenMP*, by Barbara Chapman After 20 years, it’s more relevant than ever. Enabling FPGAs for Software Developers, by Bernhard Friebe, and James Reinders Boosting efficiency and performance for automotive, networking, and cloud computing. Modernize Your Code for Performance, Portability, and Scalability, by Jackson Marusarz What’s new in Intel® Parallel Studio XE. Dealing with Outliers, by Oleg Kremnyov, Mikhail Averbukh, and Ivan Kuzmin How to find fraudulent transactions in a real-world dataset. Tuning for Success with the Latest SIMD Extensions and Intel® Advanced Vector Extensions 512, by Xinmin Tian, Hideki Saito, Sergey Kozhukhov, and Nikolay Panchenko Best practices for taking advantage of the latest architectural features. Effectively Using Your Whole Cluster, by Rama Kishan Malladi Optimizing SPECFEM3D_GLOBE* performance on Intel® architecture. Is Your Cluster Healthy?, by Brock A. Taylor Must-have cluster diagnostics in Intel® Cluster Checker. Optimizing HPC Clusters, by Michael Hebenstreit Enabling on-demand BIOS configuration changes in HPC clusters.
January 1, 2017
Contents: Letter from the Editor: The Changing HPC Landscape Still Looks the Same, by Henry A. Gabb Henry A. Gabb is a long-time high-performance and parallel computing practitioner and has published numerous articles on parallel programming. The Present and Future of the OpenMP* API Specification, by Michael Klemm, Alejandro Duran, Ravi Narayanaswamy, Xinmin Tian, and Terry Wilmarth How the gold standard parallel programming language has improved with each new version. Reducing Packing Overhead in Matrix-Matrix Multiplication, by Kazushige Goto, Murat Efe Guney, and Sarah Knepper Improve performance on multicore and many-core Intel® architectures, particularly for deep neural networks. Identify Scalability Problems in Parallel Applications, by Vladimir Tsymbal How to improve scalability for Intel® Xeon® and Intel® Xeon Phi™ Processors using new Intel® VTune™ Amplifier memory analysis. Vectorization Opportunities for Improved Performance with Intel® AVX-512, by Martyn Corden Examples of how Intel® Compilers can vectorize and speed up loops. Intel® Advisor Roofline Analysis, by Kevin O’Leary, Ilyas Gazizov, Alexandra Shinsel, Zakhar Matveev, and Dmitry Petunin A new way to visualize performance optimization trade-offs. Intel-Powered Deep Learning Frameworks, by Pubudu Silva Your path to deeper insights.
July 1, 2016
Contents: Letter from the Editor: Democratization of HPC, by James Reiders James Reinders, an expert on parallel programming, is coauthor of the new Intel® Xeon Phi™ Processor High Performance Programming—Knights Landing Edition. Supercharging Python* with Intel and Anaconda* for Open Data Science, by Travis Oliphant The technologies that promise to tackle Big Data challenges. Getting Your Python* Code to Run Faster Using Intel® VTune™ Amplifier XE, by Kevin O’Leary Providing line-level profiling information with very low overhead. Parallel Programming with Intel® MPI Library in Python*, by Artem Ryabov and Alexey Malhanov Guidelines and tools for improving performance. The Other Side of the Chip, by Robert Ioffe Using Intel® Processor Graphics for Compute with OpenCL™. A Runtime-Generated Fast Fourier Transform for Intel® Processor Graphics, by Dan Petre, Adam T. Lake, and Allen Hux Optimizing FFT without increasing complexity. Indirect Calls and Virtual Functions Calls: Vectorization with Intel® C/C++ 17.0 Compilers, by Hideki Saito, Serge Preis, Sergey Kozhukhov, Xinmin Tian, Clark Nelson, Jennifer Yu, Sergey Maslov, and Udit Patidar The newest Intel® C++ Compiler introduces support for indirectly calling a SIMD-enabled function in a vectorized fashion. Optimizing an Illegal Image Filter System, by Yueqiang Lu, Ying Hu, and Huaqiang Wang Tencent doubles the speed of its illegal image filter system using a SIMD instruction set and Intel® Integrated Performance Primitives.
Contents: Letter from the Editor: From Hatching to Soaring: Intel® TBB, by James Reinders James Reinders, an expert on parallel programming, is coauthor of the new Intel® Xeon Phi™ Processor High Performance Programming – Knights Landing Edition (June 2016), and coeditor of the recent High Performance Parallel Programming Pearls Volumes One and Two (2014 and 2015). The Genesis and Evolution of Intel® Threading Building Blocks, by Arch D. Robison A decade after the introduction of Intel Threading Building Blocks, the original architect shares his perspective. A Tale of Two High-Performance Libraries, by Vipin Kumar E.K. How Intel® Math Kernel Library and Intel® Threading Building Blocks work together to improve performance. Heterogeneous Programming with Intel® Threading Building Blocks, by Alexei Katranov, Oleg Loginov, and Michael Voss With new features, Intel® Threading Building Blocks can coordinate the execution of computations across multiple devices. Preparing for a Many-Core Future, by Kevin O’Leary, Ben Langmead, John O’Neill, and Alexey Kukanov Johns Hopkins University adds multicore parallelism to increase performance of its Bowtie 2* application. Leading and Following the C++ Standard, by Alexei Katranov Intel® Threading Building Blocks adheres tightly to the C++ standard where it can—and paves the way for supporting parallelism best. Intel® Threading Building Blocks: Toward the Future, by Alexey Kukanov The architect of Intel® Threading Building Blocks shares thoughts on the opportunities ahead.
March 1, 2016
Contents: Letter from the Editor, James Reinders Time-Saving Tips as Spring Begins in the Northern Hemisphere Improve Productivity and Boost C++ Performance The new Intel® SIMD Data Layout Template library optimizes C++ code and helps improve SIMD efficiency. Intel® C++ Compiler Standard Edition for Embedded Systems with Bi-Endian Technology Intel® C++ Compiler Standard Edition for Embedded Systems with Bi-Endian Technology helps developers looking to overcome platform lock-in. OpenMP* API Version 4.5: A Standard Evolves OpenMP* version 4.5 is the next step in the standard’s evolution, introducing new concepts for parallel programming as well as additional features for offload programming. Intel® MPI Library: Supporting the Hadoop* Ecosystem With data analytics breaking into the HPC world, the question of using MPI and big data frameworks in the same ecosystem is getting more attention. Finding Your Memory Access Performance Bottlenecks The new Intel® VTune™ Amplifier XE Memory Access analysis feature shows how some tough memory problems can be resolved. Optimizing Image Identification with Intel® Integrated Performance Primitives Intel worked closely with engineers at China’s largest and most-used Internet service portal to help them achieve a 100 percent performance improvement on the Intel® architecture-based platform. Develop Smarter Using the Latest IoT and Embedded Technology A closer look at tools for coding, analysis, and debugging with all Intel® microcontrollers, Internet of Things (IoT) devices, and embedded platforms. Tuning Hybrid Applications with Intel® Cluster Tools This article provides a step-by-step workﬂow for hybrid application analysis and tuning. Vectorize Your Code Using Intel® Advisor XE 2016 Vectorization Advisor boasts new features that can assist with vectorization on the next generation of Intel® Xeon Phi™ processors.