All members of a HiveMQ cluster will output log messages to indicate replication is in progress while the process is ongoing
Code Block |
---|
INFO - Starting cluster replication process. This may take a while. Please do not shut down HiveMQ.
[...]
INFO - Replication is still in progress. Please do not shut down HiveMQ. |
While these messages are being logged, the cluster is at risk of data loss, should more than replica count -1
brokers be removed from the cluster.
Once all necessary data has been exchanged the brokers will log the following upon complete data exchange:
Code Block |
---|
INFO - Finished cluster replication successfully in 30000 ms. |
This indicates that replication for all necessary persistent data to reach its replication factor on the target hosts has been achieved. However, this log message does not mark the completion of all replication related tasks. The individual brokers may still be observing I/O and CPU load stemming from the replication process.Prerequisite read: How to ensure a smooth rolling upgrade?
Info |
---|
In order to ensure the cluster has reached its base line load with the traffic it is experiencing, the metrics give further insight. Observing these metrics in addition to the mentioned log line can be helpful when performing rolling upgrades in clusters that are operating close to the limits of their hardware’s capabilities. |
A join process (where a fresh broker with no state joins the cluster) has been completed once the following tasks return to 0:
Code Block |
---|
com.hivemq.internal.singlewriter.* topic-tree.remove-locally.queued com.hivemq.internal.singlewriter.client-session-subscription-persistence.remove-locally.queued com.hivemq.internal.singlewriter.client-session-persistence.remove-locally.queued com.hivemq.internal.singlewriter.client-queue-persistence.remove-local.queued com.hivemq.internal.singlewriter.client-event-persistence.remove-bucket.queued |
Further, the replication batches should also have reached 0 again:
Code Block |
---|
com.hivemq.replication.batches-queued com.hivemq.replication.batches-sent |
Observing these metrics in addition to the mentioned log line can be helpful when performing rolling upgrades in clusters that are operating close to the limits of their hardware’s capabilities.
Ensuring replication related tasks are resolved prior to rotating the next node is the least impactful way to perform such topology changesOnce the above tasks are complete, the cluster has returned to baseload and the rolling upgrade can be continued.