How to Transfer Data to a Hyperscale Cloud Provider

Organizations have a few options for moving data to the Microsoft and Amazon clouds.

We’ve come to think of copying or transferring data as actually “moving” or “flowing,” like water through pipes or tubes. A popular, silly characterization of the internet as a “series of tubes” went viral in 2006 and was widely lampooned or adopted as fact, depending on the observer’s level of knowledge. But, of course, what is really going on is that sequences of ones and zeros (bits) forming patterns recognizable or useful by computer programs (such as hypervisors, operating systems and applications) are being relayed between storage media, internal or external to the device itself.

These bit patterns can be relayed with blistering speed commonly measured in increments of high-hundreds of millions of (8-bit) bytes per second (Bps) within internal memory or modern-day external bus technology, low-hundreds of millions of Bps over local switched Ethernet networks, and tens to hundreds of million bits per second (bps) over dedicated or public wide-area network connectivity. In addition, the further removed the traffic is from on-board memory and storage, the more overhead is introduced in the transmission, with a reduction in efficiency.

The relaying of data is possible thanks to bit-pattern transmission protocols, both physical and logical, including basic and advanced technologies such as serial advanced technology attachment (SATA), fiber-channel (FC), Internet Small Computer System Interface (iSCSI), Ethernet frames, Ethernet switching, TCP packetization of the data, routing, encryption, tunneling and security checkpoints. These protocols make the relaying of data possible, private and error-free, but they all introduce delays in the relaying of the data patterns.

Getting Physical with Data Transfer

With the above in mind, it becomes clear that relaying all or part of an organization’s data patterns from a leased or owned data center to a hyperscale cloud provider’s storage services in a timely manner can be a significant challenge. Even a small business can utilize terabytes of storage. Most organizations cannot justify the cost or even have availability to transmit anything over 100Mbps full-duplex (same data relay transmission rate for both receive and send) since that cost averages $1,700 per month in the United States.

As an example, relaying 5 terabytes (TB) ̶ think of it as 5,000 gigabytes (GB) or 5,000,000 megabytes (MB) ̶ over a 100Mbps (100,000,000 bits per second) internet connection would take a theoretical minimum of 4.5 days. Practically, it would be much more due to the overhead and delays discussed previously. While technologies and tricks such as compression, de-duplication and WAN acceleration have been developed to reduce this transmission time, they are most effective over the repeated transmission of the same or similar data, and are less or ineffective when data encryption is used. Some medium-size businesses can require transmission of 500TB of data, which could take a year to relay.

Meanwhile, the data is changing every day, and the initial copy would be worthless by the time it was completed. So again, practically, the initial relaying of high mega-, giga- or petabytes of data to a remote cloud service can be impractical or impossible in a reasonable amount of time. Cloud providers have had ample time to grapple with this problem and have developed technologies and processes to enable large-scale data relaying. All involve a shipment of physical media.

How Microsoft Transfers Data

For Microsoft Cloud, the Azure Import/Export service provides a self-serve mechanism to create copy jobs using up to 10 10TB single 3.5” hard drives per job. The data copy is performed using Azure PowerShell, encrypted using Microsoft’s BitLocker encryption technology and journaled prior to shipping out.

The local data copy requires a relatively robust 64-bit system, as well as required infrastructure to attach traditional spinning-disk hard-drives to it (solid state drives are not supported at this time) that can later be disconnected and shipped out.

There are many factors to determine the time the data will take to copy to the drives, and the drives must first go through an encryption process by the BitLocker software which can take many hours on large drives. The encryption process can be done prior to the copy process so this does not impact the copy time adversely. Typically, this system will be connected to other systems where the data resides over the local network; this will be the limiting factor in the data copy speed.

Once received by Microsoft, the drives are attached to the Azure infrastructure in the target Azure region and copied to either a page or block blob within an Azure storage account container. There is no service level agreement (SLA) for the service, but practically speaking, Microsoft begins processing and performing the copy job soon after receipt of the hardware. This service exists for other Azure services as well, such as importing local Outlook personal folder files to Office 365.

How Amazon Tackles Transfers

For Amazon’s AWS Cloud, there are even more options. AWS Snowball is similar to Azure’s Import/Export service. However, AWS provides the hardware. Of course, the device is designed to transfer data into buckets residing under Amazon’s Simple Storage Service (S3). A ruggedized system that can store up to 50TB or 80TB of data is automatically sent to you once you create the snowball job in the AWS console. The system has a specially-programmed Kindle device built-in that provides the interface to configure the unit’s three 10Gbps interfaces that provide traditional RJ45 Ethernet jacks, fiber or copper SFP+. The screen of the device also displays the shipping label when completed for shipment back to AWS.

Accelerating Migrations with AWS Snowball

Client software provided by Amazon performs the copy operations, but there is also command line and even application program interfaces (APIs) provided to allow programmatic automation. As with Azure’s Import/Export service, the system requires a relatively powerful workstation for the encryption operations (the client performs the encryption in memory on the source files before sending to the snowball device, rather than using Microsoft’s BitLocker technology), but as this target device is networked, multiple parallel copies from multiple workstations can be run simultaneously for highest copy performance. As with Azure, there is no SLA, but typically a day elapses after receipt of the unit by Amazon and the import job begins.

The AWS Snowball process flow for importing data to AWS

Snowball’s Process Flow for Importing Data to AWS

AWS provides Snowball Edge as an extension of Snowball, which has more storage (100TB) per device and multiple units can be clustered together. Beyond being a repository for data, Snowball Edge is designed to be a storage device your systems and applications can actually run off of during data migration.

The AWS Snowball Edge device

AWS even provides an exabyte-level data import service called Snowmobile. It is a 45-foot long shipping container filled with linked storage devices, delivered to your data center by a semi-truck. Consulting services are provided to help attach this massive storage system to your local network for copy operations.

As we have explored, moving large amounts of an organization’s data to a cloud provider can be an impossibility without a physical shipment of devices. The limits of affordable high-speed telecommunication create the necessity for a device that can be used locally at the data centers to avoid them. Both Azure and AWS provide mechanisms for this task, with AWS providing a fuller range of options at this time. Mountains can be moved, bit by bit, and enable your business to leverage the awesome power of hyperscale cloud providers.

Get a consultation from one of CDW’s Cloud experts.