How to set up a Data lake integration with HiveMQ
The HiveMQ Data Lake extension forwards MQTT messages from HiveMQ directly into a cloud data lake via object storage (in this example, AWS S3), typically in Parquet format. Data can further be routed to a wide range of analytics, machine learning, and processing tools.
Below is an overview of how to set up:
AWS S3 bucket for MQTT ingestion
HiveMQ for Data Lake integration
These steps facilitate the ingestion of HiveMQ MQTT messages into an S3 repository. Downstream data processing such as utilizing AWS Lake Formation, falls outside the scope of this document.
📋 Prerequisites
AWS access (create IAM and S3 bucket permissions)
HiveMQ installed
MQTT CLI installed
Data Lake license (.elic)
Instructions
The S3 bucket will act as the primary repository that will store all MQTT messages from HiveMQ.
Steps:
Sign in to AWS and navigate to S3:
Click Create Bucket:
Leave all values in the form as default and enter the Bucket name (for this example “hivemq-mqtt”). Click on Create bucket.
Go back into the newly created bucket, under the Properties tab and copy the ARN:
Note: Keep the ARN for safekeeping for the next section in this document.
This user will be assigned the specific AWS permissions required to write MQTT messages directly to your S3 bucket. In this section, we will create the IAM user and a custom policy.
Steps:
Navigate to IAM:
In the left pane, select Users → Click Create user
Specify a User name, click Next:
Select the Attach policies directly radio button, then click Create policy:
Click the JSON switch button and in the Policy editor window, add:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "mqttToS3",
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": "<ARN>/*"
}
]
}…with the ARN copied in Step 2. Click on Next. This grants the user write access to S3 only.
Fill out the Policy name, then click on Create policy.
Back in the Set permissions window, click the refresh button on the Permissions policies, and select the policy that was just created, then click Next:
On the next screen, you will see the policy under Permissions summary. Click Create user.
During HiveMQ startup, the Data Lake extension initiates authentication with AWS using an access key. Let's create the access key.
Steps:
Back on the Users screen, find the newly created user and click on it
Go to the Security credentials tab, and under Access keys select Create access key:
Click on Other as the Use case and click Next.
Enter a description of what the key is required for in the Description tag value box. Click on Create access key.
Copy the access key and secret access key values to be used in the next section.
Step 4: Configure HiveMQ to connect to your S3 repository
With the S3 repository and IAM user setup, now let's configure the Data Lake config.xml file in HiveMQ.
Steps:
Copy the aws-credentials and config.xml from
<hivemq>/extensions/hivemq-data-lake-extension/conf/examplesto<hivemq>/extensions/hivemq-data-lake-extension/confCopy the access key credentials as created in Step 3, into the aws-credentials file:
[default]
aws_access_key_id=<AWS_ACCESS_KEY_ID>
aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>Best practice: Use secrets to store the access key credentials for K8s deployments or environment variables for on-prem deployments.
Compose your Data Lake config.xml (make note of <> values that need to be supplemented):
<hivemq-data-lake-extension xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="config.xsd">
<aws-credential-profiles>
<aws-credential-profile>
<id>aws-credentials</id>
<profile-file><hivemq>/extensions/hivemq-data-lake-extension/conf/aws-credentials</profile-file>
</aws-credential-profile>
</aws-credential-profiles>
<mqtt-to-s3-routes>
<mqtt-to-s3-route>
<id>my-mqtt-to-s3-route</id>
<mqtt-topic-filters>
<mqtt-topic-filter>factory/machine/sensor</mqtt-topic-filter>
</mqtt-topic-filters>
<aws-credential-profile-id>aws-credentials</aws-credential-profile-id>
<bucket>hivemq-mqtt</bucket>
<region><region bucket was created in></region>
<processor>
<parquet>
<columns>
<column>
<name>topic</name>
<value>mqtt-topic</value>
</column>
<column>
<name>payload</name>
<value>mqtt-payload</value>
</column>
<column>
<name>timestamp</name>
<value>timestamp</value>
</column>
</columns>
</parquet>
</processor>
</mqtt-to-s3-route>
</mqtt-to-s3-routes>
</hivemq-data-lake-extension>
Note: Please see our documentation for supporting Azure Blob storage configuration.
Copy the Data Lake .elic file into
<hivemq>/licenseDelete the DISABLED file in
<hivemq>/extensions/hivemq-data-lake-extensionStart HiveMQ service
Publish a message:
mqtt pub -i testclient -t 'factory/machine/sensor' -m 'test message from hivemq'Verify that the MQTT message has been sent to the S3 bucket: