May 23, 2019
An Overview of AWS Storage: Block vs. Object Storage Services
Understanding the basics of storage technology will help you choose the right service for your needs.
Cloud providers are offering a number of innovations for deploying and developing applications. Suitable combinations of these offerings allow you to build highly performant, scalable and cost-optimized solutions in the cloud. Due to its criticality, I am going to explore many storage options available in Amazon Web Services (AWS) for fulfilling different use cases.
Since this is introductory material, the discussion is intentionally kept at a fairly high level. In later postings, I will go into additional details and cover the nonfunctional business requirements of storage services such as read-write capacity, availability, data durability, encryption, data access patterns and cost.
The fundamental building blocks of any storage service consist of a set of individual hard drives. Setting aside the differences in physical interfaces between the host computer (parallel, serial, etc.), fundamentally there are magnetic hard drives with revolving disks and solid-state drive–based hard drives. While SSD-based drives are faster than magnetic drives, they are also more expensive. AWS has developed a set of abstractions/services based upon these core building blocks. The appropriate analysis of end use cases helps to select the right set of these abstractions for a cost-optimized and robust storage solution.
Block Storage vs. Object Storage
At a high level, AWS offers two types of storage services: block storage and object storage — the fundamental difference being the unit of storage in these abstracted services. As the name implies, in block storage, each file is divided into multiple blocks of specified byte size. The size of these individual blocks is normally set to values such as 512, 2048 or 4096 bytes and is configured during OS installation. When applications write data to the hard drive, these are the “units” of data transfer between the host computer and the hard drive.
The object storage abstraction, on the other hand, treats each file as a single unit and the manipulation of the objects (such as add, update, delete) is driven via a high-level application programming interface (API). Setting aside the nonfunctional aspects such as durability, redundancy and fault tolerance, object storage uses the hard drive for storing the object.
This being one of the fundamental concepts, let us illustrate it with an example. Let’s say that after the final edit, this blog post is 5120 bytes long and the initial block size chosen for the hard drive during OS installation is 512 bytes. A simple calculation indicates that the hard drive consumes exactly 10 blocks of storage space for saving the file. After submitting the post for a review, the blog editor finds that only a single character had to be updated in the middle of the file (say the fifth block). In the block storage, the 512 bytes in the fifth block corresponding to the updated single character are required to be transferred from the host computer memory to the hard drive.
However, in the object storage scenario, the operation would consist of sending the entire 5120 bytes from host computer to the “AWS object service” from a high-level API (command line or AWS console). In the object storage scenario, the updated content may be stored in a block storage media (the secret sauce of AWS). If we compare the byte array of the original content to the new one, the difference would only be the single byte in the fifth block. However, AWS does not provide any mechanisms for updating just the single character nor the entire fifth block.
How Storage Use Cases Differ
At a cursory glance, it may appear that the object level storage is highly inefficient, since the entire content requires updating. However, object storage supports an entirely different set of use cases than those supported by block storage. While block storage is suitable for content that is changing rapidly, object storage is suitable for storing read-only content such as backup static website content or a data lake/data mart in which the requirement is to retain the original untransformed records.
In my next blog post, I build on this basic understanding of block and object storage, explaining the various service offerings AWS has built out around these storage technologies.