Show / Hide Table of Contents

Gossip-based Communication

Gossip-based communication is a way to spread a message (rumor) across all cluster nodes using infection-style message exchange. This approach is very efficient in comparison to naive broadcasting especially in large-scale clusters. But how the node knows how to join the cluster and discover other peers? The answer is membership protocol that allows to discover and keep the view of cluster members.

HyParView

HyParView is an implementation of membership protocol for peers that want to communicate with each other using infection-style message exchange. This algorithm is highly scalable up to thousands of nodes. However, it has one major drawback: every node has partial view of the entire cluster.

.NEXT offers transport-agnostic implementation of HyParView algorithm represented by PeerController class. It's an entry point for building a peer with messaging capabilities. The controller is responsible for all key aspects of the algorithm:

  • Implementation of joining procedure
  • Maintaining a list of neighbors
  • Announcing of new peers
  • Removing peers from the mesh

IPeerConfiguration interface provides the configuration model of HyParView-enabled peer. When the peer configured properly, you need to bootstrap the controller using StartAsync method. The method accepts an address of the contact node optionally. This address is not needed when you starting the first peer in the mesh. However, if you already have a mesh of peers then you need to announce a new peer correctly. Contact node is responsible for announcing joined peer across all peers in the mesh. There is no preference in choosing the appropriate node for that purpose.

When the node is joined to the mesh then it will be able to discover a list of neighbors automatically. PeerDiscovered and PeerGone events of PeerController class provide ability to track changes in the list of visible peers. Neighbors property provides a list of the peers visible by the local node.

StopAsync method provides a way for graceful shutdown of the node. All neighbors will be informed that the stopped node is no longer accessible and it must be removed from the list of neighbors.

EnqueueBroadcastAsync method can be used to broadcast the message (rumor) to all neighbors. This is a core of Gossip-based messaging. The method requires an implementation of IRumorSender interface that provides the delivery of the message using the specific transport. Additionally, the method controls the delivery status. If it suspects that the peer is unavailable then it removes the peer from the list of neighbors.

Integration with ASP.NET Core

DotNext.AspNetCore.Cluster provides implementation of HyParView protocol using HTTP transport and PeerController class on the top of ASP.NET Core infrastructure.

There are two main components of HyParView implementation that must be registered in ASP.NET Core application:

  • Singleton service that implements PeerController abstract class and maintain the state of the peer
  • ASP.NET Core Middleware that is responsible for processing HyParView messages over HTTP

The following code demonstrates a basic setup of HyParView:

using DotNext.Net.Cluster.Consensus.Raft.Http.Embedding;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;

sealed class Startup
{
    public void Configure(IApplicationBuilder app)
    {
        app.UseHyParViewProtocolHandler();	// informs that processing pipeline should handle HyParView-specific requests
    }

    public void ConfigureServices(IServiceCollection services)
    {
        services.AddOptions();
    }
}

IHost host = new HostBuilder()
    .ConfigureWebHost(webHost => webHost
        .UseKestrel(options => options.ListenLocalhost(80))
        .UseStartup<Startup>()
    )
    .JoinMesh()  //registers all necessary services required for normal cluster node operation
    .Build();

Note that JoinMesh method should be called after ConfigureWebHost. Otherwise, the behavior of this method is undefined.

JoinMesh method has overloads that allow to specify custom configuration section containing the configuration of the local peer.

UseHyParViewProtocolHandler method should be called before registration of any authentication/authorization middleware.

Dependency Injection

The application may request the following services from ASP.NET Core DI container:

  • IPeerMesh<HttpPeerClient> provides access to all peers visible from the local node, discovery events, configured HttpClient for communication with neighbors
  • PeerController maintains the state of the local peer, handles HyParView-specific messages and exposes EnqueueBroadcastAsync method for rumor spreading

Configuration

The application should be configured properly to serve HyParView messages. The following JSON represents the example of configuration:

{
    "protocolVersion" : "http2",
    "protocolVersionPolicy" : "RequestVersionOrLower",
    "requestTimeout" : "00:01:00",
    "clientHandlerName" : "HyParViewClient",
    "contactNode" : "https://192.168.0.2:3232/",
    "localNode" : "https://192.168.0.1:3232",
    "activeViewCapacity" : 5,
    "passiveViewCapacity" : 10,
    "activeRandomWalkLength" : 3,
    "passiveRandomWalkLength" : 2,
    "shuffleActiveViewCount" : 2,
    "shufflePassiveViewCount" : 5,
    "shuffleRandomWalkLength" : 2,
    "queueCapacity" : 15,
    "lowerShufflePeriod" : 1000,
    "upperShufflePeriod" : 1500,
}
Configuration parameter Required Default Value Description
protocolVersion No auto HTTP protocol version to be used for the communication between members. Possible values are auto, http1, http2, http3
protocolVersionPolicy No RequestVersionOrLower Specifies behaviors for selecting and negotiating the HTTP version for a request. Possible values are RequestVersionExact, RequestVersionOrHigher, RequestVersionOrLower
requestTimeout No 00:30:00 Request timeout used to access peers across the network using HTTP client
clientHandlerName No HyParViewClient The name to be passed into IHttpMessageHandlerFactory to create HttpMessageInvoker used by HyParView client code
contactNode No N/A The address of the contact node used as an entry point to announce the local node. This property should not be specified if the local node is the first node launched as a part of the mesh
localNode Yes N/A The addres of the local node visible by other peers in the mesh
activeViewCapacity No 5 The maximum number of neighbors visible by the local node. The recommended value of this property can be calculated as log(n) + c, where n is a maximum number of peers, c is a constant value. According to HyParView paper, the recommended value of c is 1
passiveViewCapacity No 10 The maximum number of peers stored in the backlog that is used to replace the peers removed from the active view. The recommended value of this property can be calculated as k(log(n) + c), where k is a constant value. According to HyParView paper, the recommended value of k is 6
activeRandomWalkLength No 3 The maximum number of hops a ForwardJoin request is propagated
passiveRandomWalkLength No 2 The value specifies at which point in the walk the peer is inserted into passive view. The value should be less than activeRandomWalkLength configuration property
shuffleActiveViewCount No activeViewCapacity / 2 The number of peers from active view to be included into Shuffle message
shufflePassiveViewCount No shufflePassiveViewCount / 2 The number of peers from passive view to be included into Shuffle message
shuffleRandomWalkLength No passiveRandomWalkLength The maximum number of hops a Shuffle message is propagated
queueCapacity No activeViewCapacity + passiveViewCapacity The capacity of the internal queue used to process HyParView messages
lowerShufflePeriod No 1000 The lower bound of randomly selected shuffle period, in milliseconds
upperShufflePeriod No 3000 The upper bound of randomly selected shuffle period

If lowerShufflePeriod < upperShufflePeriod then the actual period will be chosen randomly in the specified range. If lowerShufflePeriod = upperShufflePeriod then the actual period is determined by a constant value. Otherwise, the controller never sends Shuffle message to other peers.

Controlling node lifetime

The service implementing PeerController abstract class is registered as singleton service. It starts receiving HyParView-specific messages immediately. Therefore, you can loose some events raised by the service such as PeerDiscovered at starting point. To avoid that, you can implement IPeerLifetime interface and register implementation as a singleton.

using DotNext.Net;
using DotNext.Net.Cluster.Discovery.HyParView;

internal sealed class MemberLifetime : IPeerLifetime
{
	private static void OnPeerDiscovered(PeerController controller, PeerEventArgs args) {}

	void IPeerLifetime.OnStart(PeerController controller)
	{
		controller.PeerDiscovered += OnPeerDiscovered;
	}

	void IPeerLifetime.OnStop(PeerController controller)
	{
		controller.PeerDiscovered -= OnPeerDiscovered;
	}
}

Example

There is HyParView playground represented by HyParViewPeer application. You can find this app here. This playground allows to test HyParView membership protocol in real world.

The following command starts the first peer in the mesh:

cd <dotnext>/src/examples/HyParViewPeer
dotnet run -- 3262

3262 is a port number that can be used to access the peer. When a mesh has at least one launched peer, you can select any peer as a contact node and specify its port when starting a new node:

cd <dotnext>/src/examples/HyParViewPeer
dotnet run -- 3263 3262

3263 is a port of the started node. The second argument (port 3262) specifies the port of the contact node. This node must be launched.

You can launch as many peers as you want, but the port number must be unique for each instance.

The terminal window of the peer will display discovery events. Each peer exposes the following HTTP resources that can be examined using Web Browser:

  • GET https://localhost:3262/neighbors can be used to obtain a list of neighbors visible by the peer. You can change 3262 to the port number of the appropriate peer. According to HyParView, each peer may have partial view of the entire cluster so the list of neighbors may differ
  • GET https://localhost:3262/rumor can be used to spread the rumor across all peers in the mesh. The terminal window associated with each launched peer will display the identifier of the broadcast message
  • Edit this page
☀
☾
In this article
Back to top
Supported by the .NET Foundation
☀
☾