BigDL: Distributed Deep Learning on Apache Spark

  • Overview
  • Resources
  • Transcript

BigDL is a distributed deep learning library for Apache Spark*; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop* clusters. The three features discussed are: deep learning support, high single node Intel® Xeon® performance, and efficient scale--out leveraging Spark architecture.

Hi, I'm Radhika and in this video I'm going to cover some high level features about BigDL, which is a deep learning library for Apache Spark. Remember to follow the links below for more resources. Stay with me to learn more. 

As I mentioned before, BigDL is a distributed deep learning library for Apache Spark. With BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters. 

As the leading framework for distributed machine learning, the addition of deep learning to the super popular Spark framework is important because it allows Spark developers to perform a wide range of data analysis tasks. Including data wrangling, interactive queries, and string processing within a single framework. That helps avoid that complexity inherent in using multiple frameworks and libraries. 

Three important features offered by BigDL are rich deep learning support, high single node Xeon performance, and last but not least, efficient scaleout leveraging spark architecture. Let's go into the details. The first one is rich deep learning support. More than [INAUDIBLE] BigDL provides comprehensive support for deep learning, including numeric computing, and high level neural networks. In addition, users can load pre-trained Caffee or Torch models into Spark programs using BigDL. 

The second one is high single load Xeon performance. To achieve high performance, BigDL uses INTEL math kernel library and multi-threaded programming in each Spark task. Consequently, it is orders of magnitude faster than an out of box open-source Caffee, Torch, [INAUDIBLE] on single node Xeon. The third is efficient scaleout, leveraging Spark architecture. BigDL can efficiently scaleout to phone data analytics and big data scale by leveraging Apache Spark, as well as efficient implementations of synchronous SGD, and all-reduce communications on Spark. 

Let's talk about a typical BigDL users. You want to use BigDL if you want to analyze big data using deep learning on the same Hadoop or Spark cluster where the data is stored. If you want to add deep learning functionalities, data chaining or prediction to your bid data programs or workflow. If you want to leverage existing Hadoop or Spark clusters, run your deep learning applications, which can then be dynamically shared with other work loads. 

So how can BigDL benefit customers? With this new unified platform, customers can eliminate large volume of unnecessary data set transfer between separate systems. Eliminate separate hardware clusters, and move towards a CPU cluster, reduce system complexity and latency for end-to-end learning. 

Ultimately, customers can achieve better scale, higher performance and resource utilization, ease of use, and better TCO. I hope the information I shared with you inspires you to join and contribute to this project. For more information about BigDL, check out the links below. Don't forget to like this video and subscribe to the INTEL software YouTube channel, and check us out on Facebook.