With Disaster Recovery as a Service, Insurer Church Mutual Is Ready for Anything
By adopting a new data replication and DRaaS strategy, the company ensured that it will be up and running when its customers need it the most.
- by Calvin Hennick
- Business and technology journalist | January 16, 2019
As an insurance company, Wisconsin-based Church Mutual needs to be there for its customers in times of disaster. But until recently, the organization’s disaster recovery plan for its own data center was a source of uncertainty.
“The plan we had in place was flawed, with a variety of different issues,” explains Craig Huss, assistant vice president and CISO for the company. “We would back up our data here in Wisconsin, and then we would send those backup tapes to an offsite storage facility. To test the solution, twice a year, we would take a set of tapes and ship them to a third party, and they would read them and try to restore them.”
Huss says that the testing yielded “less-than-desirable” results: “We were less than 50 percent successful in recovering data, and we were looking at 12 to 24 hours after a disaster just to get our tapes out of the library and get them shipped to a recovery center.”
Even if the tape backup solution worked as planned during an actual emergency, it would be slow — and would result in significant data loss. The tapes were only backed up once per day, meaning that they were often missing up to 12 hours of company data.
“In all, it was going to take up to four days to get our company back up and running after a disaster,” Huss says. “We needed something with a lot more reliability and speed.”
Learn how CDW can help with your disaster recovery plan. Explore your options by visiting CDW.com/backup.
Moving Toward a New DR Strategy
Church Mutual began taking steps toward a new, improved disaster recovery solution in 2016, when the company replaced aging storage infrastructure at its Merrill, Wis., headquarters.
”The primary driver behind this is that we can be there for our policy holders when they need to make a claim,” Huss adds. ”We needed to put ourselves in a position where we wouldn’t be impacted when our customers needed us in the time of a disaster. That’s why we made the investment we did.”
The company purchased the IBM Storwize V7000 storage area network, in part to take advantage of the solution’s SAN-to-SAN replication capabilities. “We knew we wanted to do something that was more real-time in terms of replicating data,” says Huss.
However, Church Mutual leaders knew that backing up their entire environment — and testing their DR solution on an ongoing basis — would require extensive planning, expertise and manpower. Huss and others wanted to adopt a Disaster Recovery as a Service (DRaaS) solution that would minimize hands-on duties for internal staffers, both during testing and in the case of an actual emergency. Church Mutual leaders didn’t want IT staffers pulled away from their regular duties to be tied up with routine DR testing, and they worried that employees simply wouldn’t be available to report for work if the area experienced a natural disaster powerful enough to take the company’s primary data center offline.
“We had really good organizational support,” says Huss. “We have an enterprise risk assessment team within our company, and part of their function is to look for key risks to the organization. Disaster recovery was at the top of the list, due both to the natural disasters that were happening across the country, as well as the poor history of our DR solutions.”
A number of Church Mutual’s core applications are written in RPG, a legacy programming language that Huss says proved to be incompatible with public cloud providers. And so, instead of adopting a public cloud DRaaS solution, the company began searching for a partner that could help architect a workable DR strategy, conduct regular tests of the solution, and then actually implement the strategy in the case of a disaster.
When we went to the SAN-to-SAN replication, that eliminated all of the logistics around locating and shipping backup tapes. The data is automatically replicated to the disaster recovery site and is available to us at any time.
Craig Huss, Assistant Vice President, CISO
Architecting a Two-Site Solution
Shortly after engaging with CDW, Church Mutual procured space at a colocation facility in Fitchburg, Wis., which is located just outside of Madison and sits about 170 miles from Church Mutual’s headquarters — far enough away that the two sites are unlikely to be impacted by a single disaster, but close enough that Huss and his team members can make the drive to the secondary site if needed.
Church Mutual duplicated all of the servers and storage equipment from its primary data center at the colocation site, then worked with CDW to devise a data replication and disaster recovery plan that would protect the company in case of disaster.
The new solution includes:
Ongoing SAN-to-SAN Data Replication
Because Church Mutual has the IBM Storwize V7000 SAN in both of its data centers, the infrastructure at the primary site is able to replicate data to the backup site nearly in real time. “What the SAN-to-SAN replication does for us is, when we make a change to the data here, that block of data gets replicated to the secondary site and is backed up almost instantaneously throughout the day,” says Huss. “While our recovery time objective is 12 hours, we’d only potentially lose about an hour’s worth of data, or less, if we lost the primary data center.”
“We’re as close to real-time as we can be for having that information stored in two spots,” Huss adds. “When we went to the SAN-to-SAN replication, that eliminated all of the logistics around locating and shipping backup tapes. The data is automatically replicated to the disaster recovery site and is available to us at any time.”
Solution architects from CDW developed scripts to build out the logical partitions that enable the solution. “All it takes to switch over is for the replication to be stopped,” says Bill Brey, a managed services engineer for CDW. “Then you have access to that data as it is — from that moment on — as long as you bring up the logical partitions at the recovery site.”
Cisco UCS Central
“To create a more seamless solution, I recommended that we deploy UCS Central,” explains Bart Kniess, a CDW principal consulting engineer. “We leveraged UCS Central to reach their existing site and make it more consistent. And then we used those same policies and profiles at their recovery site. Consistency is important when we’re talking about DR. When there are differences, that’s when exceptions start coming in. UCS Central created consistency across the data centers, as well as full visibility of both data centers.”
Veeam and VMware
The recovery site utilizes Veeam disaster recovery software to back up the virtualized servers that run many of Church Mutual’s applications. “VMware is our primary virtualization software that we use to virtualize Windows and Linux servers,” Huss says. “Together, the VMware and Veeam environment is our future state. We do Veeam backups, and that allows us to recover point-in-time, full-server images. So, in addition to having the replicated data, we have a Veeam backup of the servers that we can recover from.”
The virtual machines are classified in tiers, allowing the disaster recovery team to bring up the most important systems first. And the recovery environment allows for testing of isolated systems. “An administrator can bring up a single console and can recover any one of the servers with a click of a button,” says Huss. “We don’t have to do a full DR test to test a single device or single server.”
“We’re able to test in a bubble,” adds Kniess. “We can ‘fail’ the primary site, and it brings up the VMs at the recovery site, but they’re not actually online. All the VMs are technically stood up, but they’re isolated on their own little network.”
The Value of a Partner
In addition to designing and building out Church Mutual’s disaster recovery solution, CDW tests the system each quarter — and stands ready to direct Church Mutual’s DR efforts if an actual disaster strikes. “If we declare a disaster, they have the resources to come in and recover our environment on our behalf,” says Huss. “Because we can recover our systems without the involvement of our internal IT staff, the resilience of our company is that much better.”
“Even DR testing can have a pretty big impact to your IT team,” Huss adds. “Employees aren’t working on projects that add business value when they’re working on DR. We got our team back during testing, and we can let CDW manage our DR solution if we need it.”
CDW solution architects developed a DR run book that calls for different team members to take charge of particular aspects of the recovery process. In meetings with Church Mutual, there were as many as 10 CDW team members sitting with internal IT staffers. “Any time you have more diverse knowledge about your environment, you’re going to be better off,” Brey says. “CDW brings a storage team, a network team, a virtualization team and a Windows team to this process. When a disaster hits, we start up a conference bridge, we start our run book up, and boom, it’s like clockwork.”
“It’s not just the data center,” adds Fei Xiao, a CDW managed services engineer. “It’s the right work for the right people. We provide Church Mutual with more flexible staff that they may not have.”
Huss says that the combination of new technologies and CDW’s services team has eliminated a major source of worry — and a real liability – for the company. “We could test every day if we wanted to now,” he says. “To have that level of reliability is huge.”
Always Have a Backup Plan
Survey data shows that many organizations are using multiple layers of disaster recovery. The Uptime Institute asked more than 450 businesses about their methods for maintaining data resiliency, allowing them to select more than one method:
- Regular backups to a secondary site: 68%
- Near real-time replication to a secondary site: 51%
- Disaster Recovery as a Service: 42%
- Replication of workloads and data to two or more sites: 40%
- Use of cloud-based high-availability services: 36%
Source: Uptime Institute, “Global Data Center Survey,” (PDF) March 2018.