Derive Value from Data Analytics & AI at Scale with Open Source Software and Intel® Platform Technologies

Strata New York
Global data center traffic is increasing at a 25% compound annual growth rate according to the Cisco Global Cloud Index. The question is, “How will organizations turn that data deluge into value, for a sustainable competitive advantage, at scale?”  
At Strata Data New York on September 11-13, Intel will showcase technology innovations to deliver that value, using open source software like Apache Spark*, BigDL and Analytics Zoo as a catalyst. Built and optimized for the latest Intel platform technologies such as Intel® Xeon® Scalable processors and Intel® Optane™ DC Persistent Memory, they bring greater capacity and performance to big data analytics and AI solutions.
As an open source software leader, Intel is the number-one upstream contributor to the Linux* kernel, and we have provided a steady stream of source code contributions and optimizations for Apache Spark*, the emerging unified analytics and AI engine for large-scale data processing. 
Spark fills a role at the intersection of AI, streaming analytics, and batch analytics, offering ease-of-use for developers writing in Java*, Scala*, or Python*. Adding even more value to the Spark platform, Intel open source contributions also include the Optimized Analytics Package to accelerate Spark queries, the BigDL deep learning library/framework and the Analytics Zoo analytics and AI platform for Apache Spark and BigDL.

A Key Trend:  Fast Data

Companies have started transitioning from big data to fast data. Fast data is data in motion. It provides the ability to learn continuously, make decisions, and take actions as soon as data arrives, typically in milliseconds.
Imagine the scenario for a credit card company when a person swipes a card to purchase something: analytics or AI applications need to immediately run hundreds of input variables such as location, time, recent purchases, and previous transactions through complex logic to determine whether to approve or decline the transaction—all within milliseconds.
Implementing such a use case can present an extreme processing bottleneck. Many learning algorithms iterate a computation over a training dataset and update the model parameters until the model converges. To accelerate training performance, it’s common to cache the huge dataset and parameters into memory. However, memory constraints are a common challenge. 
Benchmarks Spark Apache CassandraThat is exactly why we believe Intel® Optane™ DC persistent memory can be a real game changer for fast data. Our benchmark testing shows that Spark SQL (Spark's module for working with structured data) performs eight times faster1 at a 2.6TB data scale using Intel Optane DC persistent memory versus a comparable system using dynamic random access memory (DRAM) dual in-line memory modules (DIMMs) and solid-state drives (SSDs)**.  Even greater improvements were noted with the Apache* Cassandra “not only SQL” (NoSQL) database2,3.  

Open Source Software Advancements

Intel advancements for analytics workloads go beyond our silicon innovations to include in-memory database optimizations and upstream contributions to numerous open source projects. As an ecosystem leader and open source software contributor, Intel aims to optimize all major deep learning frameworks and topologies, including TensorFlow*, Caffe*, MXNet*, and Chainer* to run well on Intel® architecture.  Visit https://ai.intel.com/framework-optimizations/ for details.
As a top contributor to Apache Spark, Intel open sourced the BigDL deep learning library/framework and Analytics Zoo
BigDL was created natively for Apache Spark, which makes it very easy to perform deep learning model training and inference on existing Intel Xeon processor-based big data clusters. It is highly optimized through the Intel® Math Kernel Library-Deep Neural Networks (Intel MKL-DNN).  BigDL is the latest software to be included in the Intel® Select Solutions family to deliver faster, easier, optimized capabilities that are pre-tested and verified by Intel and our ecosystem partners.
To unify analytics and AI on one platform, we recently open sourced Analytics Zoo. It unites Spark, TensorFlow, Keras, and BigDL programs into one pipeline. The entire pipeline can transparently scale out to a large Spark/Hadoop cluster for distributed training or inference. In addition, it provides high level pipeline APIs, pre-trained deep learning models, and reference use cases.
Analytics Zoo
Analytics Zoo includes high level pipeline APIs, built-in deep learning models and reference use cases to provide an end-to-end analytics and AI platform.

Next Steps

Your organization can use existing infrastructure as an AI foundation, reducing hardware cost and the total time to solution. And with open source software as a catalyst, Intel innovations in communications, storage/memory and computer processing can help your enterprise move faster, store more, and process everything, to turn the data deluge into value. 
If you’re in New York, come see my Strata Data keynote at 9:30am Thursday, 09/13/2018 at the Javits Center, Location 3E.  And be sure to check out these Intel sessions:
Wed. Sept. 12th
How the blurring of memory and storage is revolutionizing the data era
  • 1:15pm–1:55pmLocation: 1A 04/05
A deep learning approach for precipitation nowcasting with RNN using Analytics Zoo on BigDL
  • •2:55pm–3:35pmLocation: 1A 15/16
Thu. Sept. 13th
A high-performance system for deep learning inference and visual inspection
  • 1:10pm–1:50pmLocation: 1A 15/16
Job recommendations leveraging Deep Learning using Analytics Zoo on Apache Spark and BigDL
  • 2:00-2:40pmLocation: 1A 15/16
In addition, stop by the Intel booth #717 to check out our theater sessions and these impressive demos:
  • Accelerating and Simplifying Apache Spark Analytics/AI at Scale
  • Apply Machine Learning to IoT Data for Real-Time Insights
  • Accelerating Capital Risk Analysis 
  • Brain Tumor Segmentation 

Learn More

Visit our Advanced Data Analytics page to get more details about Intel® big data technologies for analytics and AI. 

Note: Intel Optane DC Persistent Memory opportunity equals 2022 data center memory SAM. 
1 – 8x (8/2/2018)
2, 3 – 9x reads/11x users (5/24/2018)
Performance results are based on testing and may not reflect all publicly available security updates. No product can be absolutely secure. See detailed configurations in backup slides for details. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks
For more information:
For more complete information about compiler optimizations, see our Optimization Notice.