Shared Subscription Sharding Explained

In version 4.1 HiveMQ introduces a new sharding concept for shared subscriptions.

How does it work?

  • Each Shared Subscription (defined by share name + topic name e.g.: $share/my-share-name/test/topic) has its own queue.

  • These Shared Subscription Queues are split into smaller, partial queues – shards.

    • There can be an arbitrary number of these shards split across an arbitrary number of HiveMQ cluster nodes and members of the shared subscription.

    • A very even distribution is achieved.

  • Incoming PUBLISH messages that match a shared subscription get distributed amongst the existing shard based on a consistent hashing algorithm.

  • A second round of consistent hashing determines the specific subscriber that polls a batch of messages from a local shard.

    • Only shared subscribers that are at the moment capable of consuming additional messages are allowed to poll.

Benefits

  • With two rounds of consistent hashing linear scaling to an arbitrary number of shared subscribers and HIveMQ nodes is achieved.

    • This means that applications with immense message throughput get enabled to use shared subscriptions and consume all messages.

  • The preferred distribution of messages is no longer dependent on which HiveMQ node the publishing clients are locally connected to and instead is based upon the location of the subscribers.
    (Compared to HiveMQ 3)

  • Single slower consuming clients have very little impact on the overall performance of the shared subscription.

  • No messages are lost even if a number of shared subscribers permanently lose their connection.

The best scalability is achieved when the members of a shared subscription are evenly distributed across the cluster