...
Check the metrics and make sure that the Client Event Count is growing while the node is in the joining state
com_hivemq_persistence_executor_client_events_tasks
Check the config.xml and make sure that Client Event History is enabled
/opt/hivemq/conf/config.xml
Code Block language xml <client-event-history> <enabled>true</enabled> <lifetime>604800</lifetime> <!-- 7 days --> </client-event-history>
If both checks are positive, then use the following workaround:
For all nodes of the HiveMQ cluster: Modify the HiveMQ config.xml and disable the Client Event History. Save the changes to the file.
Code Block language xml <client-event-history> <enabled>false</enabled> <lifetime>604800</lifetime> <!-- 7 days --> </client-event-history>
Stop the nodes but One by one, stop all nodes of the cluster, except the last one. Stop only those nodes, which cannot finish the join process to the cluster. Stop the nodes one-by-one while watching When stopping a node, monitor the hivemq.log log for successful shutdown. Only after that stop the next node.
On the last node of the cluster, modify the run.sh file so that it ensures a stateful restart.
Code Block language bash JAVA_OPTS="$JAVA_OPTS -DstatefulCluster=true
Restart the service at the last node of the cluster (stateful restart)
Start the service on the rest nodes of the cluster one by one, monitoring the logs for successful start and the end of the join replication process.
Upgrade the cluster to the version past 4.18 to ensure that the Client Event History issue is fixed.
Enable the Client Event History on the nodes of the cluster and restart in a stateful manner.
...