Reducing load during a rolling upgrade
“Finishing cluster replication” log message only confirms that data loss does not occur. There are still background replication related tasks running.
Please wait for these tasks to finish before continuing the rolling upgrade process. The individual brokers may still be observing I/O and CPU load stemming from the replication process. In order to ensure the cluster has reached its baseline load with the traffic it is experiencing, Please check the metrics in Replication task metrics for Rolling Upgrade
Observing these metrics in addition to the mentioned log line can be helpful when performing rolling upgrades in clusters that are operating close to the limits of their hardware’s capabilities.
Context
While doing a rolling upgrade, you would notice the following messages in the logs
INFO - Starting cluster replication process. This may take a while. Please do not shut down HiveMQ.
[...]
INFO - Replication is still in progress. Please do not shut down HiveMQ.
All members of a HiveMQ cluster will output log messages to indicate replication is in progress while the process is ongoing.
Issue:
While these messages are being logged, the cluster is at risk of data loss, should more than replica count -1
brokers be removed from the cluster.
Solution:
Once all necessary data has been exchanged the brokers will log the following upon complete data exchange:
INFO - Finished cluster replication successfully in 30000 ms.
This indicates that replication for all necessary persistent data to reach its replication factor on the target hosts has been achieved.
Important
The individual brokers may still be observing I/O and CPU load stemming from the replication process. In order to ensure the cluster has reached its baseline load with the traffic it is experiencing, Please check the metrics in Replication task metrics for Rolling Upgrade