IoT with Amazon Kinesis and Spark Streaming

The Internet of Things (IoT) is increasingly becoming an important topic in the world of application development. This is because these devices are constantly sending a high velocity of data that needs to be processed and analyzed.  Amazon Kinesis and Amazon IoT are a perfect pair for receiving and analyzing this data. Spark Streaming can be used to process the data as it arrives.

Today we will be looking at Amazon IoT, Kinesis and Spark and build a streaming pipeline.

Amazon provides an IOT data generator called Simple Beer Simulator. (SBS) The simulator generates random JSON data that represents what might be coming from a IoT device hooked up to a beer dispenser. Data such as temperature, humidity, and flow rate are returned via the simulator.

The sample data above will be streamed into Amazon IOT and passed via rule to Kinesis.

Creating the Kinesis Stream

Log into the AWS console and click on Kinesis and create a Kinesis stream called iot-stream.

One shard is plenty for this example as we won’t be doing any stressing the application with a large volume of devices and data.  In a real world scenario increasing the number of shards in a Kinesis streams will improve application scalability.

Create an IoT Rule

Log into the AWS console and head over to IOT.  Click on create a new rule.

IoT Rule

Name/sbs/devicedata/#
Attribute*
Topic Filter/sbs/devicedata/#

Create an IoT Action

Select Kinesis as a destination for your messages.

On the next screen you will need to create a rule to publish to Kinesis.

Click Create Role to automatically create a role with the correct policies. Click through to complete creating the rule. If you are using an existing role you may want to click the update role button.  This will add the correct Kinesis stream to the role policy.

Create IAM User

In order for the SBS to be able to publish messages to Amazon IoT it uses boto3 and as such requires  permission to the appropriate resources.

Create a user with AWSIoTFullAccess and generate an access key and secret.

In the sbs directory there is a credentials file that should be updated with your access key and secret.

build the docker container for the SBS

Run the Docker container

At this point you should now have data being sent to Kinesis via Amazon IOT

Spark Streaming

The Scala app I created reads data off of Kinesis and simply saves the result to a CSV file.

You will need to create a user that has access to read off of the Kinesis stream.  This credential would be different than the one used for the SBS.  Here I am just using my key which has admin access to everything in the account. In a real world scenario you should restrict this key to only being able to read the iot-stream.

Define a case class to use as a holder for the JSON data we receive from Kinesis.

Connect to the Kinesis stream.

At each batch interval we will receive multiple RDDs from the IoT DStream. We will iterate over these parsing the JSON into our case class.  Once we have a RDD with our Beer class we can write the data out to disk.

The complete code listing is below.

Compile the jar using sbt

Copy the jar to the container

Running Spark

In order to facilitate running spark we again turn to Docker.  My Docker image is based on the work by  Getty Images.  I did have to make some minor adjustments to their spark image to upgrade to Hadoop 2.8 as well as remove an AWS library from the Hadoop class path.

Build the container

Run both the worker and the slave with docker compose

exec into the container to run spark-submit

Let the spark job run for a few minutes. Eventually you should see some csv files in the <project_root>/spark/data/data/csv directory

The complete code for this post can be found on GitHub

In reality this entire exercise could have been done with Kinesis firehose.  Firehose would output the data to s3 directly without using Spark.  However, I did want to illustrate the use of Spark with Kinesis as in future posts I would like to show you how to do something interesting with your IoT data.

You may also like...

2 Responses

  1. August 22, 2017

    […] my last post, IoT with Amazon Kinesis and Spark Streaming I discussed connecting Spark streaming with Amazon IoT and Kinesis.  Today I would like to show […]

  2. August 22, 2017

    […] my last post, IoT with Amazon Kinesis and Spark Streaming I discussed connecting Spark streaming with Amazon IoT and Kinesis.  Today I would like to show […]