IEI Tank* AIoT Developer Kit and AWS Greengrass*: Running Machine Learning Prediction on the Edge

By Rosalia Nyurguhun, Published: 09/10/2018, Last Updated: 09/10/2018


A W S Cloud diagram

In this tutorial, we will setup a basic machine learning prediction model to run as an Amazon Web Services (AWS)* Lambda function in an AWS Greengrass* group. We will use basic K-Means clustering to train the module for motor fault prediction. The Lambda function will utilize the resources of the Greengrass Core, which be setup on an IEI Tank* AIoT Developer Kit. The Lambda function will send status updates of its ML prediction process to the Greengrass group using MQTT messages.


IEI TANK with Ubuntu* 16.04 OS

AWS account

AWS Greengrass

AWS Greengrass* Setup

First, we will need to setup the Greengrass Core on the IEI TANK. Follow instructions in modules 1 and 2 in the linked documentation, Environment Setup for Greengrass and Installing the Greengrass Core Software in AWS Greengrass.

Go to AWS console, select Services from the top left ribbon, enter IoT in the search bar, and select IoT Core. On the IoT Core page, select Software from the bottom left. Download the AWS Greengrass Core SDK by clicking on Configure Download. Choose Python* 2.7 and click Download Greengrass Core SDK. After the package has loaded, untar it:

tar –xzvf greengrass-core-python-sdk-1.0.0.tar.gz

Go to the HelloWorld folder and unzip the file:

cd aws_greengrass_core_sdk/examples/HelloWorld

Contents of the unzipped folder will be used later in the tutorial to create a zip folder for AWS Lambda.

IEI Tank* Setup

Because AWS Greengrass needs Python* 2.7, we need to install packages specifically for Python 2.7:

sudo apt install python-pip
sudo pip2 install pandas numpy matplotlib scipy sklearn
sudo pip2 install -U pandas numpy matplotlib scipy sklearn

Clone the Motor-Defect-Detector GitHub* repository and go the Kmeans folder:

git clone
cd motor-defect-detector/Kmeans/

We will be using the Bearing Data Set for K-means basic model training and prediction. Download the Bearing Data Set by going to the website.

Install the apps to extract the files:

sudo apt-get install p7zip-full unrar

Unzip the data set:

7za x IMS.7z

Extract the rar files (only the first and second test sets are used in this tutorial):

unrar x 1st_test.rar 
unrar x 2nd_test.rar

Downgrading Code to Python* 2.7

Before we can use the GitHub repository code, we need to implement some changes to downgrade it from Python* 3.5 to Python 2.7, and run the training script. To modify the script on your own, follow these two steps.

In the Kmeans folder, open the script and add to the first line:

from __future__ import print_statement

Replace input to raw_input throughout the file, like the following:

filedir_testset1 = raw_input("enter the complete directory path for the testset1")


Alternatively, you can also get the completely modified training script from the Sample Code section of this article.

Training the Model

In the Kmeans folder, train the K-means model and follow the prompts:

enter the complete directory path for the testset1 /<path-to>/motor-defect-detector/Kmeans/1st_test/
enter the complete directory path for the testset2 /<path-to>/motor-defect-detector/Kmeans/2nd_test/

Training is done on the Bearing Data Set to improve prediction of motor defects. The method outputs the kmeanModel.npy file which will be used in the actual prediction of motor defects.

AWS* Lambda Setup

In this section, we will create a compressed folder and create the AWS Lambda function with it. Then, we will deploy the Lambda in our Greengrass group.

Copy the Greengrass files into the Kmeans folder:

cp –r <path-to>/aws_greengrass_core_sdk/examples/HelloWorld/greengrasssdk .

Create and move into the Kmeans folder from the Sample Code section of this article.

Compress files into a zip folder:

zip –r greengrasssdk/ kmeanModel.npy

Go to AWS console, click Services on top left, put Lambda in search bar and click on it. The Lambda Management Console will open. Click Create function:

A W S console services

If not selected, select Author from scratch and fill out outlined fields:

Author from scratch

Click Create function.

Upload Change handler name to kmeans_test.function_handler. Click Save:

kmeans test

Click on Actions, select Create new version and add a version description. Click Publish:

L A T E S T version

Go to the IoT Core console. Choose Greengrass from left-side menu, select Groups underneath it, and select your group from the main window:

Greengrass tank

Select Lambdas from the left-side menu. Click Add Lambda on right top corner of the screen:

tank add Lambda

Select Use Existing Lambda:

Existen Lambda

Select kmeans_test from the menu and click Next:

kmeans test python 2.7

Choose the version and click Finish:

Lambda version

Click on the dotted area and select Edit Configuration:

tank edit configuration

Change Memory Limit to 1024 MB, Timeout to 25 seconds, and choose Lambda lifecycle to be a long-lived function:

Group specific Lambda configuration

Locate the needed environmental variables. For example, to locate Python packages like numpy, run this command:

locate 2.7/dist-packages/numpy

Add environmental variables and paths to the packages and 2nd_test folder as values:

2nd test Lambda configuration

Click Update on the bottom of the page.

Click the little grey back button, select Resources. Click on blue button Add a local resource:

screen add a local resource

Create a local resource to access the Kmeans folder on your IEI Tank. Attach kmeans_test Lambda to it with read and write access:

screen selected options 2nd_test folder

Create two more local resources for the Python packages folder and the 2nd_test folder, with read-only access. You should see a similar screen when you’re done:

Go to Subscriptions. Click Add Subscription or Add your first Subscription:

Screen add your first subscription

For the source, choose from the Lambdas tab, and select kmeans_test. For the target, select IoT Cloud:

screen from the Lambdas tab and select kmeans_test

Click Next. Add hello/world for the topic and click Next:

screen add hello world

Click Finish.

On the group header, click Actions, select Deploy and wait until it is successfully completed:

screen select deploy actions

Go to the AWS IoT console. Select Test from the left-side menu. Type hello/world in the topic field, change MQTT payload display to display it as strings, and click Subscribe to topic:

screen subscribe to topic

After some time, messages should display on the bottom of the screen:

M Q T T client screen


We have successfully setup the basic K-means model for motor defect detection as a Lambda function. As the next step, you can explore the capability for automatic updates. One Lambda is setup to look for new test sets, and once found, it will trigger the automatic download of the new sets and create a new learning script based on those sets. Then the model will be updated to give new, improved predictions.

Sample Code

from __future__ import print_function
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import cluster
from utils import cal_max_freq,create_dataframe,elbow_method
import os

    # reading all  the files from the testset1, and testset2
    filedir_testset1 = raw_input("enter the complete directory path for the testset1 ")
    filedir_testset2 = raw_input("enter the complete directory path for the testset2 ")
    all_files_testset1 = os.listdir(filedir_testset1)
    all_files_testset2 = os.listdir(filedir_testset2)

    # relative path of the dataset, after the current working directory
    path_testset2 = "2nd_test/"
    path_testset1 = "1st_test/"

    testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5 = cal_max_freq(all_files_testset1,path_testset1)
    testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5 = cal_max_freq(all_files_testset2,path_testset2)

except IOError:
    print("you have entered either the wrong data directory path for either testset1 or testset2")

result1 = create_dataframe(testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5,7)
result2 = create_dataframe(testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5,0)

result3 = create_dataframe(testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5,2)
result3 = result3[:1800]

result4 = create_dataframe(testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5,1)
result4 = result4[:800]

#creating the final result
print("creating the final result")
frames = [result1,result3,result2,result4]
result = pd.concat(frames)

X = result[["fmax1","fmax2","fmax3","fmax4","fmax5"]]

#elbow method: to calculate the optimal no of cluster

k_means = cluster.KMeans(n_clusters = 8,n_init = 10,max_iter = 1000,n_jobs = -1,random_state = 42)
kmeans_model =
label = kmeans_model.labels_

#plot the labels
print("plotting the labels")

#save the model
print("saving the model")
filename = "kmeanModel.npy",kmeans_model)

from __future__ import print_function

import time
from threading import Timer
import os
import greengrasssdk
import platform

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from utils import cal_max_freq, plotlabels

# Creating a greengrass core sdk client
client = greengrasssdk.client('iot-data')

# Retrieving platform information to send from Greengrass Core
my_platform = platform.platform()

def kmeans_test_run():
    client.publish(topic='hello/world', payload='Started kmeans test run.')
        filedir = os.environ.get("TESTSET2")
        client.publish(topic='hello/world', payload='Got data dir.')
        #filepath ="2nd_test/"
        filepath = os.environ.get("TESTSET2FOLDER")
        client.publish(topic='hello/world', payload='Got data folder.')
        # load the files
        all_files = os.listdir(filedir)
        client.publish(topic='hello/world', payload='Got all files.')
        freq_max1, freq_max2, freq_max3, freq_max4, freq_max5  =  cal_max_freq(all_files, filedir)
        client.publish(topic='hello/world', payload='Got all frequencies.')
    except IOError:
        print("you have entered either the wrong data directory path or filepath")
        client.publish(topic='hello/world', payload='Wrong data dir or folder.')

    # load the model
    filename = "kmeanModel.npy"
    model = np.load(filename).item()
    client.publish(topic='hello/world', payload='Loaded K-means model.')
    # checking the iteration
    if (filepath == "1st_test/"):
        rhigh = 8
        rhigh = 4
    testlabels = []
    for i in range(0,rhigh):
        print("Checking for the bearing",i+1)
        result = pd.DataFrame()
        result['freq_max1'] = list((np.array(freq_max1))[:,i])
        result['freq_max2'] = list((np.array(freq_max2))[:,i])
        result['freq_max3'] = list((np.array(freq_max3))[:,i])
        result['freq_max4'] = list((np.array(freq_max4))[:,i])
        result['freq_max5'] = list((np.array(freq_max5))[:,i])

        X = result[["freq_max1","freq_max2","freq_max3","freq_max4","freq_max5"]]

        label = model.predict(X)
        labelfive = list(label[-100:]).count(5)
        labelsix = list(label[-100:]).count(6)
        labelseven = list(label[-100:]).count(7)
        totalfailur = labelfive+labelsix+labelseven#+labelfour
        ratio = (totalfailur/100)*100
        if(ratio >= 25):
            client.publish(topic='hello/world', payload='Bearing is suspected to fail.')
            client.publish(topic='hello/world', payload='Bearing is in normal condition.')

    # Asynchronously schedule this function to be run again in 5 seconds
    Timer(5, kmeans_test_run).start()

# Start executing the function above

# This is a dummy handler and will not be invoked
# Instead the code above will be executed in an infinite loop for our example
def function_handler(event, context):

Learn More

About the Author

Rosalia Nyurguhun is a software engineer at Intel in the Core and Visual Computing Group, working on scale enabling projects for the Internet of Things.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804