When I started using Apache Spark* for large-scale data analytics, MLlib* was the only option to do machine learning within this framework. MLlib is a good—but limited—package. BigDL brought deep learning to Spark, so when it was first released back in December 2016, I asked the developers for an overview: BigDL: Optimized Deep Learning on Apache Spark (The Parallel Universe, Issue 28). In this issue's feature article, we're checking in with them a year and a half later to see what's new: Advancing Artificial Intelligence on Apache Spark with BigDL.
Imagine being able to develop applications written in your favorite programming language, deploy them on the Web, and achieve near-native performance. In our last issue, we talked about OpenCV.js*, a new technology to do sophisticated computer vision computations within a Web browser (see Computer Vision for the Masses in The Parallel Universe, Issue 32). In this issue, Why WebAssembly* Is the Future of Computing on the Web describes another new technology to run complex computations inside the browser.
I once evaluated a programming tool that gathered reams of performance data for my application and displayed it in a concise way. At first, I was enthralled by the amount of data at my disposal and the colorful GUI. But when the novelty wore off, I realized that none of the data was actionable―or otherwise helpful in tuning my application. Code Modernization in Action demonstrates step-by-step how to turn Intel® Parallel Studio XE analyses into code optimizations.
Non-volatile memory is becoming an increasingly important hardware technology. In-Persistent Memory Computing with Java* describes libraries that allow Java applications to use this technology. Expect more articles on non-volatile memory in future issues of The Parallel Universe.
Faster Gradient-Boosting Decision Trees describes one of the many new features and enhancements in the latest release of Intel® Data Analytics Acceleration Library (Intel® DAAL).
The Intel® MPI Library is also being continuously optimized to take advantage of new hardware and changes to the MPI standard. Always overlap communication and computation to hide latency in MPI has been the conventional wisdom for as long as I can remember. But demonstrating the performance benefit of communication-computation overlap in real applications has been elusive. So when non-blocking collectives were added to the MPI standard a few years ago, I didn't pay much attention. Hiding Communication Latency Using MPI-3 Non-Blocking Collectives is causing me to rethink these new functions, especially if I have to harness a large number of compute nodes to process a large dataset.
Future issues of The Parallel Universe will bring you articles on using FPGAs for deep learning, threading in Python*, new approaches to large-scale distributed data analytics, new features in Intel® software tools, and much more. Be sure to subscribe so you won't miss a thing.