Building Faster Data Applications on Spark* Clusters Using Intel® Data Analytics Acceleration Library

  • Overview

Apache Spark* is an open-source cluster computing framework that’s widely popular for big data processing applications. Intel® Data Analytics Acceleration Library (Intel® DAAL) is a library that offers optimized performance for many fundamental machine learning and data analysis algorithms running on Intel Architectures. This library is flexible enough to be plugged in to different big data frameworks. When using Intel DAAL with Spark*, users enjoy native performance of the underlying hardware and still get the ease-of-use and development productivity offered by the Spark environment. This webinar describes the integration by showing how to use the Intel DAAL Java API within Spark programs. The presentation goes through a few code examples of using Intel DAAL algorithms (e.g., PCA* and KMeans*) on a Spark cluster. Programming tips, performance, and the upcoming new features of Intel DAAL are also discussed.

Download Sample Code [22 KB]
View PDF [1.22 MB]

Benchmark results were obtained prior to the implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, see Performance Benchmark Test Disclosure.