This article shares the experience and lessons learned from Baosight and Intel team in building an unsupervised time series anomaly detection project, using long short-term memory (LSTM) models on Analytics Zoo.
In manufacturing industry, particularly in the steel industry, there are two ways to avoid producing unqualified products caused by device failure. One is to maintain equipment regularly; the other is to replace the equipment component before they fail. Both approaches could be unnecessarily expensive. However, it is possible to collect a massive amount of vibration data of different devices, and automatically detect anomalies of the device statuses using these data. Efficient time-series data retrieval and automatic failure detection of the devices at scale is the key to saving a lot of unnecessary cost.
Recurrent neural networks (RNNs), especially LSTMs are widely used in signal processing, time series analysis. As connectionist models, RNNs capture the dynamics of sequences via cycles in the network of nodes. In this project, we adopt the approaches of LSTMs to simulate statistics of vibration signals; in the following section, we use Cincinnati University’s Center for Intelligent Maintenance Systems (IMS) lifecycle data (download) to showcase the analytics pipeline.
Analytics Zoo is an analytics + AI platform (based on Apache Spark*, BigDL, etc.) open-sourced by Intel, which makes it easy to build end-to-end deep-learning applications for big data that can run directly on standard Apache Hadoop*/Spark clusters based on Intel® Xeon® processors (no GPUs needed).
We have built the end-to-end LSTM-based anomaly detection pipeline on Apache Spark and Analytics-Zoo, which applies unsupervised learning on a large set of time series data. A sequence of vibrational signals (signals that last 50 seconds) leading to the current time are used as input to the LSTM model, which then tries to predict the next data point. When the next data point is distant from the model’s predictions, we consider it an anomaly.
The entire end-to-end pipeline is illustrated in Figure 1.
Figure 1. Anomaly detection pipeline of vibration time serials based on Analytics Zoo and Apache Spark*.
val model = Sequential[Float]() model.add(LSTM[Float](8, returnSequences = true, inputShape = inputShape)) model.add(Dropout[Float](0.2)) model.add(LSTM[Float](32, returnSequences = true)) model.add(Dropout[Float](0.2)) model.add(LSTM[Float](15, returnSequences = false)) model.add(Dropout[Float](0.2)) model.add(Dense[Float](outputDim = 1))
Figure 3 shows comparisons between LSTM model predictions and ground truth of vibration time series. Only two statistics are shown here, namely, peak and RMS of the same channel. Other statistics show similar fluctuations. The red points are anomalies detected. The orange line is prediction of the LSTM model. The blue line represents the ground truth. The model successfully detects the failure of the device at the end, as well as spikes after 600 timesteps. Some of the early fluctuations give warnings.
Figure 3. Comparisons between recurrent neural network (RNN) predictions (orange lines) and ground truth (blue lines) of variational time serials for the same channel’s peak data (upper chart) and RMS data (lower chart).
By adopting an unsupervised deep-learning approach, we can efficiently apply time-series anomaly detection for big data at scale, using the end-to-end Spark and BigDL pipeline provided by Analytics Zoo, and running directly on standard Hadoop/Spark clusters based on Intel Xeon processors. These functionalities and solutions - for example collecting and processing massive time series data (such as logs, sensor readings) - and the application of RNN to learn the patterns and predict the expected values to identify anomalies, are critical for many emerging smart systems, such as industrial, manufacturing, AIOps, IoT, etc. Anomaly detection of time series would likely to play a key role in the use cases such as monitoring and predictive maintenance. (Here is one simple example of unsupervised anomaly detection using the Analytics Zoo Keras-style API.)
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804