How to isolate a node from the cluster

In situations where a node behaves unexpectedly or experiences memory consumption issues, it's advisable to isolate the node from the cluster. This allows you to analyze the problem further by creating heap dumps or thread dumps. This guide will walk you through the steps to isolate a node.

Isolating a node may result in increased load (memory, CPU, and network) on other cluster members. If the heap consumption is excessively high, this could potentially lead to a total cluster outage. It's important to note that executing this script on your production system should only be DONE after receiving confirmation from a member of the HiveMQ Support team.

 Instructions

  1. Create a Bash script named "isolate_node.sh" with the following content:

    #!/bin/bash # Block incoming traffic on port 7800 iptables -A INPUT -p tcp --dport 7800 -j DROP # Block outgoing traffic on port 7800 iptables -A OUTPUT -p tcp --dport 7800 -j DROP # Save the changes iptables-save > /dev/null
  2. Run the script with sudo permissions on the affected node:

    sudo chmod +x isolate_node.sh sudo ./isolate_node.sh
  3. After executing the script, check the "hivemq.log" file of the node where the command was run. You should observe that the cluster size has decreased to 1. Additionally, the "hivemq.log" of other nodes should indicate that the cluster size has been reduced by one.

  4. You can also verify the changes in the Control Center or in your monitoring dashboard.

  5. With the node isolated, you can proceed to create heap dump or thread dumps for analysis.

  6. Once the required data is collected from the node, remove all rules from the iptables firewall to allow the node to rejoin the cluster. Use the following command:

    sudo iptables -F
  7. After executing the above command, the isolated node will rejoin the cluster. If you prefer to shut down this node and create a new one instead, you can skip step 6.

By following these steps, you can effectively isolate a node in your cluster for analysis and troubleshooting purposes.

 Related articles