Replication task metrics for Rolling Upgrade

Prerequisite read: https://hivemq.atlassian.net/wiki/spaces/KB/pages/2536833025

In order to ensure the cluster has reached its base line load with the traffic it is experiencing, the metrics give further insight. Observing these metrics in addition to the mentioned log line can be helpful when performing rolling upgrades in clusters that are operating close to the limits of their hardware’s capabilities.
Ensuring replication-related tasks are resolved prior to rotating the next node is the least impactful way to perform such topology changes.

A join process (where a fresh broker with no state joins the cluster) has been completed once the following tasks return to 0:

com.hivemq.internal.singlewriter.topic-tree.remove-locally.queued com.hivemq.internal.singlewriter.client-session-subscription-persistence.remove-locally.queued com.hivemq.internal.singlewriter.client-session-persistence.remove-locally.queued com.hivemq.internal.singlewriter.client-queue-persistence.remove-local.queued com.hivemq.internal.singlewriter.client-event-persistence.remove-bucket.queued

Further, the replication batches should also have reached 0 again:

com.hivemq.replication.batches-queued com.hivemq.replication.batches-sent

 

Once the above tasks are complete, the cluster has returned to baseload and the rolling upgrade can be continued.