Replication task metrics for Rolling Upgrade
Prerequisite read: Reducing load during a rolling upgrade
In order to ensure the cluster has reached its base line load with the traffic it is experiencing, the metrics give further insight. Observing these metrics in addition to the mentioned log line can be helpful when performing rolling upgrades in clusters that are operating close to the limits of their hardware’s capabilities.
Ensuring replication-related tasks are resolved prior to rotating the next node is the least impactful way to perform such topology changes.
A join process (where a fresh broker with no state joins the cluster) has been completed once the following tasks return to 0:
com.hivemq.internal.singlewriter.topic-tree.remove-locally.queued
com.hivemq.internal.singlewriter.client-session-subscription-persistence.remove-locally.queued
com.hivemq.internal.singlewriter.client-session-persistence.remove-locally.queued
com.hivemq.internal.singlewriter.client-queue-persistence.remove-local.queued
com.hivemq.internal.singlewriter.client-event-persistence.remove-bucket.queued
Further, the replication batches should also have reached 0 again:
com.hivemq.replication.batches-queued
com.hivemq.replication.batches-sent
Once the above tasks are complete, the cluster has returned to baseload and the rolling upgrade can be continued.