Analytics Zoo: Unifying Analytics + AI for Apache Spark*

By Jason Dai, senior principal engineer, Intel Corporation

Continued advancements in artificial intelligence applications have brought deep learning to the forefront of a new generation of data analytics development. In particular, we are seeing increasing demand from organizations to apply deep learning technologies (such as computer vision, natural language processing, generative adversary networks, etc.) to their big data platforms and pipelines.

Today this often requires manually “stitching together” many separate components (e.g., Apache Spark*, TensorFlow*, Caffe*, Apache Hadoop* Distributed File System (HDFS), Apache Storm*/Kafka*, and others), in what can be a complex and error-prone process.

At Intel, we have been working extensively with open source community users and several partners & customers including JD.com, UCSF, Mastercard*, and many others to build deep learning (DL) and AI applications on Apache Spark. To streamline end-to-end development and deployment, Intel developed Analytics Zoo, a unified analytics + AI platform that seamlessly unites Spark, TensorFlow, Keras and BigDL programs into an integrated pipeline that can transparently scale out to large Apache Hadoop/Spark clusters for distributed training or inference.

Analytics Zoo also provides developers and users alike a rich set of analytics and AI support for the end-to-end pipeline, including:

  • Easy-to-use abstractions, such as Spark DataFrame and ML pipeline support, transfer learning support, POJO-style model serving API, and more.

  • Common feature engineering operations for image, text, and 3D image

  • Built-in deep learning models, such as text classification, recommendation, and object detection

  • Reference use cases, such as time-series anomaly detection, fraud detection, image similarity search, and more.

I am pleased to announce that we have recently open sourced  Analytics Zoo, making this platform available for wide community use and contributions.

Early users such as World Bank, Cray, Talroo, Baosight, Midea/KUKA, and others have built analytics + AI applications on top of Analytics Zoo for a wide range of workloads. These include transfer learning based image classification, sequence-to-sequence prediction for precipitation nowcasting, neural collaborative filtering for job recommendations, and unsupervised time-series anomaly detection, among other examples.

Intel is committed to continuing the collaboration with the open source community and users, as we together advance the convergence of analytics and AI on Apache Spark.

For more complete information about compiler optimizations, see our Optimization Notice.