/
Disaster recovery runbook for Kubernetes deployment

Disaster recovery runbook for Kubernetes deployment

This article explains how to recover your cluster and its persistent data in case of the HiveMQ cluster is Down or you observe the following warning in your hivemq.log

Not all replicas are currently reachable. More nodes than the replication factor of X have left the cluster in a too short time frame.

 Instructions

1. Create a separate Pod to Access PVCs

Create a manifest file (pvc-access-pod.yaml) for a temporary pod to mount the PVCs and access their data. Here's the configuration:
Note: Make sure to have a minimum 4CPU, 4GB RAM and sufficient disk space (at least 2.5 times the size of all data folders combined).
Following is the manifest example of 2 node HiveMQ cluster:

apiVersion: v1 kind: Pod metadata: name: pvc-access-pod spec: containers: - name: pvc-access-container image: busybox command: ["/bin/sh", "-c", "while true; do sleep 3600; done"] resources: requests: memory: "4Gi" # Request minimum 4GB of memory cpu: "4" # Request minimum 4 CPUs ephemeral-storage: "100Gi" # Request 100Gi of ephemeral storage limits: memory: "4Gi" # Limit to 4GB of memory cpu: "4" # Limit to 4 CPUs ephemeral-storage: "150Gi" # Limit to 150Gi of ephemeral storage volumeMounts: - name: data0 mountPath: /mnt/data0 - name: data1 mountPath: /mnt/data1 volumes: - name: data0 persistentVolumeClaim: claimName: data-hivemq-0 - name: data1 persistentVolumeClaim: claimName: data-hivemq-1

Apply the manifest:

kubectl apply -f pvc-access-pod.yaml -n <namespace>

2. Access the Pod

Once the pod is running, open a shell session to it:

kubectl exec -it pvc-access-pod -- sh

3. Zip the PVC Data

Inside the pod, run the following commands to archive the data from the mounted PVCs:

These commands create two tar files, data0.tar and data1.tar, in the /tmp directory of the pod.

4. Copy the Tar Files Locally

Exit the pod shell and use the kubectl cp command to copy the tar files from the pod to your local machine:

Verify the files are present in your current directory:

5. Extract the Tar Files Locally

Create a folder to store the extracted data:

Extract the contents of the tar files:

This consolidates all the data into the extracted_data directory.

6. Run the HiveMQ Disaster Recovery Tool

Download and set up the HiveMQ Disaster Recovery tool if you haven't already.

Run the tool with the extracted data directories:

This command processes the PVC data and exports the backup file to the export/ directory.

Note: This step can take minutes up to hours depending on the amount of data that has to be restored.

7. Once the process is complete, clean up the resources to avoid unnecessary costs:

Delete the temporary pod:

8. Upload Backup

  1. Once the recovery command is executed, you will find the new folder with a backup file in your export folder location.
    For example:

    image-20250121-154925.png
  2. Now you have a backup file ready to be restored on the running HiveMQ cluster. There are two ways to restore backup i.e via Control Center WebUI or via REST API.

    1. If you choose to restore via the control center then follow the below steps

      1. Use the backup file generated by the recovery tool and upload it via the browser under the Admin > Backup page in HiveMQ’s Control Center

      2. Import progress is shown in the Control Center and once it's completed you will get a message about it. You can always verify import progress in your monitoring dashboard.

 Related articles