Tuesday, October 4, 2011

Bigtable: A Distributed Storage System for Structured Data

Bigtable is a flexible, high performance, scalable distributed storage system for structured data built by Google.  Bigtable has a richer data model than a key-value store, but does not provide the full relational model.  Bigtable can scale up to petabytes of data and thousands of machines.  It can support a wide range of applications and configurations, and clients can control many aspects such as data layout, data format, and data locality.  Bigtable is essential a distributed, sorted map, which maps keys (row string, column string, timestamp) to string data.  The keys are lexicographically ordered, and columns are grouped into column families.  The architecture has a master which coordinates all the tablets, and tablet servers which handle tablets.  Clients usually communicate directly with the tablet servers.  SStables stored in GFS are used for the durable data, which are ordered, immutable maps.  Writes go to a redo log in GFS, and the recent writes are stored in a memtable.  When the memtable fills up, it is flushed and written to a new SStable, and occasionally, a major compaction merges several SStables into one.  Reads are served from the memtable and the merged SStables.  Locality groups are groups of column families which correspond to an SStable.  This can provide vertical partitioning in order to group related columns together.  Two pass compression scheme is used to achieve 10:1 compression ratios, better than gzip.  Experiments show that Bigtable scales up well, but not linearly, because of the imbalance in the load and the saturation of the network from GFS to the tablet servers.

Bigtable has inspired many other distributed data stores recently.  The apis of these systems are usually limited to simple key-value type of access.  However, not all applications fit the model of single row access and transactions.  I think the future of these systems will be to add more database concepts to the systems without significantly harming the scalability and availability.  Some of these features include transactions, higher level languages, and different levels of consistency.  Bigtable type of systems will look more and more like distributed parallel databases.

No comments:

Post a Comment