I have a distributed system design problem. The problem brief is
We have many clients to a service. The service needs to be fault-tolerant and so it will have many replicas. When a client can no longer access the service it will switch to a replica and inform all the other clients will switch to the same replica. (It is not acceptable to load-balance across the replicas as the replicas' data values are not exactly the same but all the clients must return the same data values at all times.)
My current design is to have the client, on failing to reach the service, ask for a new service leader. When a new service leader is established it will then notify all the clients to use it.
As with many distributed coordination designs I need a distributed group manager. I am considering the use of JGroups and/or Apache Zookeeper in a solution.
Is there an existing recipe or recipes that I should be looking at to solve this problem?
Update: We finally decided to have a master make a new copy of the repository, copy it to all service hosts, and notify the service hosts to switch to the new repository. The master tracks who has switched and raises an alarm if there is missing switchers after a fixed interval.