Tuesday, September 20, 2011

The Chubby lock service for loosely-coupled distributed systems

The Chubby lock service is a system built by Google to provide coarse grained locking and storage for low volume data.  The primary goal is for availability and reliability, and not for throughput or latency performance.  Because of the performance characteristics, it is only used for coarse-grained locking and not fine-grained locking.  Google has shown that this is not a problem.  Most usage of Chubby has been for naming and metadata/configuration storage.  Before Chubby, Google services used ad hoc methods which could have duplicated work, or required user intervention.  After Chubby, there no longer was a need for human intervention, which eases the maintenance of large distributed systems.  The core of Chubby uses Paxos for distributed consensus of information.

The Chubby service usually has about ~5 machines, running paxos.  The reads are just serviced from the master, and the writes are propagated to all machines with the paxos protocol.  A simple filesystem like interface is available so that it is easily browsable.  All directories or files can have advisory locks and the clients can hold the locks for long periods of time.  There are timeouts to ensure the system makes progress when clients fail.  To improve performance, clients can cache data, but must be connected to Chubby so that the master can invalidate information when necessary.  Writes block for the cache invalidation so that the data is consistent.  Caching greatly reduces the amount of communication between the clients and the master, which improves the scalability.  The main factor for scaling Chubby is reducing communication, not server performance.

The Chubby model is something that may become more popular for distributed systems.  Instead of some ad hoc algorithm for metadata or configuration, Google created a centralized service which provide those features.  Even for massively distributive systems, some things are best done in a centralized fashion, or there is some global data which must be shared by all nodes.  Therefore, there is always a need for some centralized service.  However, centralized services could be the single point of failure, but that is not the case for Chubby.  Chubby uses paxos to agree upon values among several machines.  This allows for some failures and partitions, which means the centralized service can provide greater reliability and availability.  In the future, there will probably have to be more consideration on the network latencies and the non-uniformity.  I think distributed systems will become more global, not just within a single datacenter, so a centralized service will need to handle varying latencies, and increased probability of partitions.  Changes will have to be made to the Chubby algorithms to take into account the longer latencies and cross-world network traffic.

No comments:

Post a Comment