Saturday, October 22, 2011

PNUTS: Yahoo!'s Hosted Data Serving Platform and Don't Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

PNUTS is a scalable, geographically distributed data store for web applications.  Web services usually have scalability, response time, high availability, and relaxed consistency requirements, and PNUTS provides all of those features.  PNUTS provides features other systems usually don't, such has geographic distribution, and a timeline consistency model, instead of eventual consistency.  The timeline consistency is achieved per record with a master per record for all the replicas.  APIs exist to get the latest version of the record, or some older versions, depending on the application.  The data is partitioned into tablets either by range or by hash, and each tablet is replicated.  Replication across regions is asynchronous with the reliable message broker, which is like a pub/sub system.  It is functions as a durable WAL, and as the replication mechanism.  Also, the messages are delivered in order, and this attribute is used to achieve timeline consistency per record.  With experiments, the average latency for client requests were around 100 ms, the latency increased as there were more writes in the system, and increasing the number of storage servers almost linearly decreased the latency.

COPS is an ALPS (availability, low latency, partition-tolerance, high scalability) key-value store which provides the causal+ consistency property.  ALPS properties are desirable for internet services but COPS also provides stronger consistency guarantees, which makes it easier to develop for.  The causal property ensures that the system maintains the causal dependencies between operations, and the + property adds convergent conflict handling.  The convergent conflict handling ensures replicas deal with conflicts in the same way so that replicas don't diverge permanently.  The 3 potential causality situations are within an execution thread, gets from when the get() follows the put(), and transitivity.  COPS is the first system to provide causal+ property in a scalable way, and not with a single master.  COPS depends on a linearizable key-value store to run within a cluster and asynchronously replicates using a replication queue.  Additional operations are available to ensure the causal+ property: get_by_version(), put_after(), and dep_check().  There is also metadata per record to store dependencies, and/or older versions for COPS-GT.  Garbage collection is occasionally performed to clear out old versions and unnecessary dependencies.  In the system, only 5 seconds of metadata needs to be stored, since transactions are only allowed to run for 5 seconds.  In experiments, the throughput increased as the put:get ratio decreased, because the number of dependencies decreased.  Also, COPS-GT usually has lower throughput than COPS because of the stronger guarantees, but as the value size increases, the inter-operation time increases and COPS-GT can achieve similar performance.  Scalability for COPS is also much better than single master log shipping systems.

Both of these systems provide stronger consistency across geographically distributed systems.  The stronger consistency is useful, but future systems will have to better deal with fault tolerance and data loss.  Both of these systems make use of a message bus to deliver asynchronous replication messages, but if a datacenter fails, all those messages will be lost, and data will be lost.  Datacenters can fail, like the Amazon outage showed, and also, modular datacenters are becoming more popular, so less reliability is built into them.  Future systems will have to try to prevent data loss in the face of datacenter failures.

No comments:

Post a Comment