The Internet of Things (IoT) ecosystem is made up of a variety of elements that perform different tasks in the collection, aggregation, and analysis of data. In a simplified ecosystem, these tasks are performed at three stages: at the edge devices, at the gateway, and in the cloud (see Figure 1). Each element differs in the resources available to it and the constraints placed on it; for this reason, the development approach and assets can differ, as well.
Figure 1. A simplified Internet of Things ecosystem
This article offers an example IoT ecosystem. It shows the constraints and challenges for each element and how the elements work together to meet the desired requirements.
A Sample Use Case
Imagine the following scenario: Monitoring patients in a residential care setting for heart abnormalities requires that those patients be connected to a variety of equipment, effectively rendering them immobile. The IoT changes this requirement through the use of wearable technology and real-time communication of data for immediate discovery. Wearable heart monitors (shown in Figure 2) communicate their data to the cloud through on-premises IoT gateways (distributed through the residential care facility).
Figure 2. Wearable heart monitors communicating data through Internet of Things gateways to the cloud
Let’s explore this scenario along with the individual elements and their development approaches.
The edge device in this scenario is a wristband that measures heart rate. At the edge, typically less processing capability is available, so the patient’s heart signal is captured and communicated to a local IoT gateway through the wristband’s Bluetooth* interface. This wearable device requires onboard processing of sensor data and minimal analysis of the signal, and then communication through a wireless protocol to the gateway.
An ideal device here is the Intel® Edison board (see Figure 3), which is capable of complex processing (it has a dual-core Intel® Atom™ processor) but in a small package and with minimal power requirements. The board is capable of processing the signal data and performing an initial assessment of the data and its meaning to the patient. Communication between the edge device and the gateway can use traditional TCP/IP over Bluetooth, a standard networking protocol used worldwide. With TCP/IP, a simple sockets-based interface permits communication of data through streams.
Figure 3. The Intel® Edison module
You can write firmware for your Intel® Edison board in a variety of languages, but the most common is the lingua franca of embedded development: C. Because the Intel® Edison board is capable of running Linux*, C is a simple choice and a staple language for this platform.
The IoT gateway is the aggregation point for collecting data from nearby edge devices—in this case, the wearable heart monitors. When the edge device has sufficient data to communicate or an event that requires immediate attention occurs, the device opens a stream socket over Bluetooth to the IoT gateway. When the connection is established, the data is streamed to the gateway (as a function of the size of data to communicate, battery level, and so on). The data could also be streamed in parts as the patient moves in and out of the IoT gateway’s range. If the data indicate an emergency, the gateway’s Wi-Fi interface communicates immediately to an attendant in the facility. The IoT gateway can also identify the patient’s location.
Nonemergency data is compressed and communicated to the cloud using Message Queueing and Telemetry Transport (MQTT). This protocol was designed with the IoT in mind and is ideal for communicating data (called topics within MQTT) using a subscribe–publish model.
Intel® IoT Gateway technology supports all these requirements (see Figure 4). In addition, Intel® IoT Gateways provide management protocols and security capabilities to ensure that communication is secure and the device is manageable. The gateway is also capable of supporting several operating systems, from Wind River* Linux to Windows® 10 IoT Core and Snappy Ubuntu* Core (Linux).
Figure 4. An Intel® IoT Gateway
The gateway software is most commonly developed in C or C++ and relies on interfaces such as the sockets library (for TCP/IP streams) and numerical libraries for signal analysis and compression. You could also use interpreted languages such as Python* in this scenario, but depending on the number of edge devices you have to manage, C/C++ would provide better performance.
In the cloud, patient data is received through MQTT publish messages (based on a previous subscribe message for each patient from the cloud application). This data is then ingested in the Apache Hadoop* Distributed File System (HDFS), which is a specialized file system for large datasets. Once here, the data can easily be processed in one of two ways: batch or real time.
Similar to the early days of computing, early big data processing systems were batch oriented. You’d create an application to process your data, and then unleash it on a system to have your results delivered later. Batch-oriented big data processing is similar, but like computing, big data has grown up. In addition to batch, there’s also stream-oriented big data processing, which supports real-time processing of data as it arrives.
But batch and stream big data processing aren’t distinct: They can work together on a cluster. As Figure 5 shows, HDFS supports Yet Another Resource Negotiator (YARN), a resource manager that allows multiple big data frameworks to operate on the same cluster and data. Above YARN are two separate frameworks: the batch side, supported by traditional MapReduce functionality, and the real-time side, supported by Apache Spark*. Clusters that support these use models commonly use powerful CPUs such as the Intel® Xeon® processor family.
Figure 5. Batch and real-time processing with Apache Hadoop* and Apache Spark*
For immediate processing of patient data, you rely on the Spark side of the cluster. Spark enables you to build applications that process data as it arrives as a stream. In this way, patient data can immediately be analyzed for irregularities. You can write Spark applications in the Scala, Java*, or Python language.
The patient data can also be batch-processed to look for patterns in the data that multiple patients share. Hadoop supports processing of data using the MapReduce paradigm but simplifies it by allowing higher-level scripts to generate the MapReduce applications, such as Apache Hive* and Apache Pig*. Hive and Pig allow you to generate pipelines of queries over large datasets. For machine learning applications, you can also apply Apache Mahout*, which is a set of algorithm libraries for Hadoop. Mahout includes collaborative filtering, clustering algorithms, and classification algorithms.
Using Mahout, you can analyze your patient data to first group your patients who have similar characteristics into clusters, and then search for patterns within these clusters to better understand them using collaborative filtering.
The benefits of this type of application go beyond protecting individual lives. Using the patient data collected from a large population permits predictive analysis in the cloud to optimize the search for signals that could precede an event. The data, coupled with information about the user, could also help tune the analysis of patient data as a function of the user’s race, age, and other factors. The IoT has the potential to make contributions at the individual level and, with cloud-based analytics, the overall population. Intel processors and gateways help simplify the development of IoT ecosystems.
- Visit the Intel® IoT Gateway web site to learn more.
- Visit the Intel® Edison Module web site to learn more.
- Visit the Wind River* Linux* web site to learn more.
Login to leave a comment below. If you are not registered go to the Intel® Developer Zone to sign up.