This article shares the experience and lessons learned from Baosight and Intel team in building an unsupervised time series anomaly detection project, using long short-term memory (LSTM) models on Analytics Zoo.
In manufacturing industry, particularly in the steel industry, there are two ways to avoid producing unqualified products caused by device failure. One is to maintain equipment regularly; the other is to replace the equipment component before they fail. Both approaches could be unnecessarily expensive. However, it is possible to collect a massive amount of vibration data of different devices, and automatically detect anomalies of the device statuses using these data. Efficient time-series data retrieval and automatic failure detection of the devices at scale is the key to saving a lot of unnecessary cost.
Recurrent neural networks (RNNs), especially LSTMs are widely used in signal processing, time series analysis. As connectionist models, RNNs capture the dynamics of sequences via cycles in the network of nodes. In this project, we adopt the approaches of LSTMs to simulate statistics of vibration signals; in the following section, we use Cincinnati University’s Center for Intelligent Maintenance Systems (IMS) lifecycle data (download) to showcase the analytics pipeline.
Analytics Zoo Solution
Analytics Zoo is an analytics + AI platform (based on Apache Spark*, BigDL, etc.) open-sourced by Intel, which makes it easy to build end-to-end deep-learning applications for big data that can run directly on standard Apache Hadoop*/Spark clusters based on Intel® Xeon® processors (no GPUs needed).
We have built the end-to-end LSTM-based anomaly detection pipeline on Apache Spark and Analytics-Zoo, which applies unsupervised learning on a large set of time series data. A sequence of vibrational signals (signals that last 50 seconds) leading to the current time are used as input to the LSTM model, which then tries to predict the next data point. When the next data point is distant from the model’s predictions, we consider it an anomaly.
The entire end-to-end pipeline is illustrated in Figure 1.
Figure 1. Anomaly detection pipeline of vibration time serials based on Analytics Zoo and Apache Spark*.
- It first reads raw data in Apache Spark as resilient distributed datasets (RDD), then extracts the features, and finally outputs features into dataframe. In the raw datasets, each data set describes a test-to-failure experiment and consists of individual files that are 1-second vibration signal snapshots recorded at 20 kHz, as illustrated in Figure 2. To train and test our models, we extracted statistics of each second as features, including root mean square (RMS), kurtosis, peak, and energy values of eight bands obtained by wavelet packet for three layers.
- It further processes the features in RDD, including wavelet domain denoising, normalizing values using a standard scaler, unrolling the feature sequence with a length of 50 (so that the model can learn the pattern from previous 50 seconds to predict next point), and transforming data into RDD of Samples at the end.
- It then uses the Keras-style API in Analytics Zoo to build a time series anomaly detection model (which consists of three LSTM layers followed by a dense layer, as shown below), and trains the model (which learns from 50 previous values to predict next one).
val model = Sequential[Float]() model.add(LSTM[Float](8, returnSequences = true, inputShape = inputShape)) model.add(Dropout[Float](0.2)) model.add(LSTM[Float](32, returnSequences = true)) model.add(Dropout[Float](0.2)) model.add(LSTM[Float](15, returnSequences = false)) model.add(Dropout[Float](0.2)) model.add(Dense[Float](outputDim = 1))
- Evaluate the model and detect anomalies on test data or full dataset. Anomalies are defined when the collected data points are distant from RNN predictions. In this project, we set the expected proportion of anomalies among the entire dataset to be 10%; that is, the 10% most distant ground truth from predictions are selected as anomalies. The threshold is a parameter which should be adjusted according to each use case.
Figure 3 shows comparisons between LSTM model predictions and ground truth of vibration time series. Only two statistics are shown here, namely, peak and RMS of the same channel. Other statistics show similar fluctuations. The red points are anomalies detected. The orange line is prediction of the LSTM model. The blue line represents the ground truth. The model successfully detects the failure of the device at the end, as well as spikes after 600 timesteps. Some of the early fluctuations give warnings.
Figure 3. Comparisons between recurrent neural network (RNN) predictions (orange lines) and ground truth (blue lines) of variational time serials for the same channel’s peak data (upper chart) and RMS data (lower chart).
By adopting an unsupervised deep-learning approach, we can efficiently apply time-series anomaly detection for big data at scale, using the end-to-end Spark and BigDL pipeline provided by Analytics Zoo, and running directly on standard Hadoop/Spark clusters based on Intel Xeon processors. These functionalities and solutions - for example collecting and processing massive time series data (such as logs, sensor readings) - and the application of RNN to learn the patterns and predict the expected values to identify anomalies, are critical for many emerging smart systems, such as industrial, manufacturing, AIOps, IoT, etc. Anomaly detection of time series would likely to play a key role in the use cases such as monitoring and predictive maintenance. (Here is one simple example of unsupervised anomaly detection using the Analytics Zoo Keras-style API.)