Global data center traffic is increasing at a 25% compound annual growth rate according to the Cisco Global Cloud Index. The question is, “How will organizations turn that data deluge into value, for a sustainable competitive advantage, at scale?”
At Strata Data New York on September 11-13, 2018, Intel showcased technology innovations to deliver that value, using open source software like Apache Spark*, BigDL* and Analytics Zoo* as a catalyst. Built and optimized for the latest platform technologies from Intel such as Intel® Xeon® Scalable processors and Intel® Optane™ DC persistent memory, they bring greater capacity and performance to big data analytics and AI solutions.
As an open source software leader, Intel is the number-one upstream contributor to the Linux* kernel, and we have provided a steady stream of source code contributions and optimizations for Apache Spark*, the emerging unified analytics and AI engine for large-scale data processing.
Spark fills a role at the intersection of AI, streaming analytics, and batch analytics, offering ease-of-use for developers writing in Java*, Scala*, or Python*. Adding even more value to the Spark platform, open source contributions from Intel also include the Optimized Analytics Package to accelerate Spark queries, the BigDL deep learning library/framework and the Analytics Zoo analytics and AI platform for Apache Spark and BigDL.
A Key Trend: Fast Data
Companies have started transitioning from big data to fast data. Fast data is data in motion. It provides the ability to learn continuously, make decisions, and take actions as soon as data arrives, typically in milliseconds.
Imagine the scenario for a credit card company when a person swipes a card to purchase something: analytics or AI applications need to immediately run hundreds of input variables such as location, time, recent purchases, and previous transactions through complex logic to determine whether to approve or decline the transaction—all within milliseconds.
Implementing such a use case can present an extreme processing bottleneck. Many learning algorithms iterate a computation over a training dataset and update the model parameters until the model converges. To accelerate training performance, it’s common to cache the huge dataset and parameters into memory. However, memory constraints are a common challenge.
That is exactly why we believe Intel® Optane™ DC persistent memory can be a real game changer for fast data. Our benchmark testing shows that Spark SQL (Spark's module for working with structured data) performs eight times faster1 at a 2.6TB data scale using Intel Optane DC persistent memory versus a comparable system using dynamic random access memory (DRAM) dual in-line memory modules (DIMMs) and solid-state drives (SSDs). Even greater improvements were noted with the Apache Cassandra* “not only SQL” (NoSQL) database2,3.
Open Source Software Advancements
Intel advancements for analytics workloads go beyond our silicon innovations to include in-memory database optimizations and upstream contributions to numerous open source projects. As an ecosystem leader and open source software contributor, Intel aims to optimize all major deep learning frameworks and topologies, including TensorFlow*, Caffe*, MXNet*, and Chainer* to run well on Intel® architecture. Visit Framework Optimizations for details.
BigDL was created natively for Apache Spark, which makes it very easy to perform deep learning model training and inference on existing Intel Xeon processor-based big data clusters. It is highly optimized through the Intel® Math Kernel Library-Deep Neural Networks (Intel MKL-DNN). BigDL is the latest software to be included in the Intel® Select Solutions family to deliver faster, easier, optimized capabilities that are pre-tested and verified by Intel and our ecosystem partners.
To unify analytics and AI on one platform, we open sourced Analytics Zoo. It unites Spark, TensorFlow, Keras, and BigDL programs into one pipeline. The entire pipeline can transparently scale out to a large Spark/Hadoop cluster for distributed training or inference. In addition, it provides high level pipeline APIs, pre-trained deep learning models, and reference use cases.
Your organization can use existing infrastructure as an AI foundation, reducing hardware cost and the total time to solution. And with open source software as a catalyst, Intel innovations in communications, storage/memory and computer processing can help your enterprise move faster, store more, and process everything, to turn the data deluge into value.
Visit our Advanced Data Analytics page to get more details about big data technologies for analytics and AI.
Note: Intel Optane DC persistent memory opportunity equals 2022 data center memory SAM.
1 – 8x (8/2/2018)
2,3 – 9x reads/11x users (5/24/2018)
Performance results are based on testing and may not reflect all publicly available security updates. No product can be absolutely secure. See detailed configurations in backup slides for details. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks.
For more information: Data-Centric Innovation