Sunday, September 25, 2011

Megastore: Providing Scalable, Highly Available Storage for Interactive Services


Megastore is a system which provides scalability, availability and consistency over the wide area network.  Many systems focus on scalability and performance, but consistent semantics are cleaner and simpler to reason about.  Also, Megastore tries to solve the issue of data loss with some replication schemes across diverse geographic regions.  Megastore also uses paxos to provide synchronous replication of a log, which is unlike most usages of paxos.  Megastore provides fully serializable ACID semantics for fine-grained partitions of the data.

Megastore blends the scalability of a noSQL system, and the semantic convenience of a traditional RDBMS.  Megastore provides availability by using synchronous fault tolerant log replication, and provides scalability by partitioning data into entity groups.  Paxos is used to synchronously replicate the write ahead log to all the replicas of each entity group, which provides ACID semantics within the group.  Across groups, 2PC is used to coordinate transactions.  At Google, there is usually some natural way to partition the data.  Several optimizations with paxos are developed to improve the write transactions.  A leader is chosen to allow for 1 round trip writes, but does not require a dedicated master.  The first writer accessing the leader gets to commit with 1 round trip, but other writers have to use 2 phases.  Witness and read-only replicas allow improve performance for writes and reads.  The commit point of Megastore is the durability point, and the visibility point is afterwards.  Most of the write scalability is achieved by partitioning the data and reducing the write conflicts.  At Google, most applications say 5 9's of read and write availability.  The reads were usually around 10s of milliseconds, and the writes were around 100-400ms, because of the WAN communication.  Most of the reads by applications were serviced by the local replica as expected.

I think future data stores will be increasingly geographically distributed, because region diversity is crucial for fault tolerance and availability.  Also, users are more geographically distributed which necessitates distributed data stores.  Eventual consistent data stores have been increasingly popular these days because of the high scalability it affords, but I think in the future higher consistency models will be required and developed.  Higher consistency is easier to reason about and develop for, and future systems will provide consistency.  Megastore shows that a consistent system can be built with decent performance, and the future will bring more multi-datacenter systems with higher consistency models.  With newer consistency models will also bring new programming models to handle the nuances of different consistency and failure modes.

No comments:

Post a Comment