Monday, September 19, 2011

Cluster-Based Scalable Network Services

The popularity of internet services have been growing because of the elimination of software distribution and easier customer service of the product.  There are several reasons for network services, but there are also several challenges in deploying the service, including scalability, availability and cost effectiveness.  The authors of this paper propose a solution of using cluster of commodity machines with a high speed interconnect, using a 3 layered architecture.

A key tradeoff in the design of these systems is the observation that many internet services do not need full consistency or durability of ACID semantics.  BASE (basically available, soft state, eventually consistent) semantics is usually good enough, and this allows for easier deployment and better performance.  With the relaxation of strict consistency, fewer messages are required during partial failures, which allows for better availability and performance.  BASE semantics also improves maintenance of the cluster because they allow stateless worker nodes, which provides easier fault tolerance and better scalability.

The layered approach provides a clean separation of tasks and modularity for the system which has many benefits.  By splitting workers from the front end, the workers can be stateless and can achieve greater scalability with ease.  By making the workers stateless, overflow workers can be allocated to handle bursty traffic efficiently.  Also, in the TACC layer, by modularizing the components, the modules can be composed together to implement a wide range of services, thus simplifying implementation.  Two systems TranSend and HotBot are two real world internet services which implement the layered concept on a cluster of commodity machines with success.  Several experiments on TranSend show that the design decisions with BASE and the layered approach can achieve linear scalability and other benefits such as handling of bursty requests, and load balancing.

The future will have even more internet services, or cloud services.  Most companies are now developing cloud services to provide highly available, scalable products.  This means cluster based services will be even more prevalent in the future.  I think the architecture will probably still look the same as today, but with more public clusters, like Amazon's EC2.  Not all companies can afford to build large clusters, so more internet services will be built on top of public clouds, like Netflix.  This means layers and modularity will become even more important.  There will probably be a growing library of modules which service developers can use.  These modules will be implementations of commonly used components of services, such as load balancers, monitoring tools, and others.  These modules may be available through the cloud provider, or may look like open source software.  I think the clean layered concept may disappear, but only because as services do more and become more complicated, a simple layered approach will not satisfy all cases.  Services will probably look like a DAG of internal services with defined APIs.  There will also be more emphasis on data storage, and different consistency and availability models as data will be become more and more important.  The new models will be necessary because internet services will have to service the entire world, and not a single data center.  The global model will have different performance and availability characteristics which will drive for different semantics for the data.

No comments:

Post a Comment