Sunday, September 11, 2011

Performance Modeling and Analysis of Flash-based Storage Devices

SSDs are becoming more prevalent in computers today, and will become the standard storage solution for datacenters and clusters.  The authors of this paper developed performance models for SSDs because their architecture is vastly different from spinning hard disks.  The SSD technology can provide low latency, but also forces slow updates and expensive block-level erasures.  The authors used several workload characteristics such as read/write ratio, request size, queue depth, and others to model performance of latency, bandwidth, and IO throughput.  Their linear regression models predicted real world performance well for 2 OLTP and 2 web search workloads.

In the next 5 to 10 years, SSDs will play a bigger part in computing infrastructure.  Currently, SSD technology is not cost effective enough in order to store all the necessary data.  Hard disks still have a huge advantage in cost to bit density ratio.  However, SSDs are slowly improving the cost/storage ratio, and could be the main storage system in warehouse scale clusters.  Disks will become the "archival tape" of the future.  SSDs have very different performance characteristics, as the paper shows.  Different access patterns benefit from SSDs or hard drives.  When the main storage for clusters is replaced by SSDs, the software stack will have to be modified to take advantage of the different characteristics.  Most software assume hard disk performance in the secondary storage in order to optimize the program.  It will be very important for programmers to adapt to the changing performance characteristics in order to take advantage of SSDs.

Since the performance of SSDs is different from hard disks, system architecture may also have to change for distributed storage systems.  Distributed file systems such has GFS or HDFS assume hard disk are the storage medium, so they adopt the optimized append only model, since hard disks are very good with sequential writes.  SSDs usually do not have as fast write performance as hard disks because of the block-level erasing requirement, but has great random read performance.  These differences will cause the distributed file systems to change.

No comments:

Post a Comment