5 Ways to Make Your Cloud Application as Unbreakable as Wolverine

Wolverine is one of the most awesome superheroes in the Marvel Universe. It’s not because of his fierceness, the fact that he’s Canadian, or the fact that he has indestructible metal claws coming out of his fists. It’s because no matter what happens to him, no matter what harm he endures, he has a healing factor that lets him come back and continue to fight, time and time again. Wolverine can push bullets out from his wounds to heal, or regenerate from an atomic blast if he has to. He is the ultimate example of resilience and a metaphor for what you want in a cloud application — a soldier who keeps fighting for your business without being stopped.

Here are 5 ways you can make your cloud application into… The Wolverine.

1. Use a cloud infrastructure that is, itself, highly redundant and highly available

There are two ways to engineer for high availability in an application. The first is to run it on resilient infrastructure, which puts the responsibilty on the infrastructure’s architects deliver infrastructure keeps on running, no matter what happens to any one component. The second is to build a resilient application, where software developers build their application with the expectation that the underlying hardware will fail, making sure that their application adapts accordingly. This can be a difficult problem to solve[1] and hire for, so the most direct route is to ensure that you are on a resilient and reliable infrastructure.

cloud.ca is an example of a cloud that is built to be a highly resilient infrastructure. It is a virtual private cloud[2] built with redundancy at every layer of the underlying hardware. If one component fails, there is always another one to back it up and keep it running.

The advantages of using a solution that is highly available at the hardware level is that it makes writing software much easier. Application developers can build their application assuming the infrastructure is perfect and still enjoy a high degree of uptime (cloud.ca has an SLA of 99.99% on the infrastructure side). Developers don’t have to check things like whether the persistent storage is still there or whether the network is up, because they be confident they always are[3]. This can speed innovation in a highly competitive development environment.

Highly resilient infrastructure allows for the porting of traditional ISV applications to the cloud, using virtualization tools such as XenApp or XenDesktop[4], without having to rewrite the entire application as a web app. Highly available infrastructure is also a solid solution for websites and web apps that were built to run on bare-metal / non-virtualized, traditional servers.

2. Build your application to switch availability zones when one fails

Availability zones are groupings of machines that are in the same failure domain. In large public clouds, you will find multiple availability zones in a single geographic region, thus allowing you to replicate your data and production servers in a nearby location. This has the advantage of low bandwidth costs for replication, and gives you the ability to fail over quickly if any one zone fails.

At least a “pilot light” disaster recovery implementation[5] is required in a second availability zone to replicate your data. Operational or automated systems can be put in place for rapid failover should the first zone fail. Similarly, an active-active implementation will allow for even more rapid failover to a second AZ, without having to provision new instances to get traffic-ready. You can also load balance traffic across multiple availability zones.

3. Build your application to fail over to another region entirely

The trouble with the multi-AZ option is that because of shared configurations within a region, multiple availability zones can go dark at the same time[6]. The next level of redundancy comes from the ability to fail over to multiple regions within a cloud provider’s infrastructure.

In recent news[7], Amazon Web Services had a major outage in its US-East–1 region. A failure cascade meant that they had trouble with their APIs and EC2 instances in all the availability zones in that region. Only companies like Netflix, who planned for inter-region DR and failover were able to recover quickly. The rest experienced something like 6–8 hours of downtime due to this region-wide failure.

Multi-region redundancy is a more robust implementation, yet it is more expensive and more difficult to implement than the multi-AZ option.

Reddit user JoeCoT writes:

[Amazon has] lots of ways to handle AZ failure. Few ways to handle region failure. Spanning your systems across multiple regions requires lots of custom work, and there are no easy tools for doing so.

Take for example, my company’s system. We have servers across all 3 availability zones in the East, and I’m adding database and web servers in Oregon and Frankfurt. But when I add servers in different AZs in East, they can communicate with each other easily, with subnet routing handled by Amazon’s setup. To add servers in other regions, I have to do tons of custom VPN setup to get them to be on the same internal network.

As far as AWS is concerned, Netflix is sharing its own tools and processes with the community on their Open Source Software Center[8] website. These kinds of tools can be built or catered for use with other public clouds as well.

4. Build a hybrid cloud for your application

When you combine a private cloud (virtualized hardware with a hypervisor like CloudStack or OpenStack) with a public cloud (AWS, Azure, or cloud.ca), you get a hybrid cloud.

Hybrid clouds allow you to “own the base and rent the peaks”[9]. They allow you to buy, provision, and configure hardware for base or average levels of utilization, while giving you the headroom to burst traffic into a public cloud as required, whenever you get massive traffic spikes, or get higher than normal traffic at certain times of year[10]. This way, you don’t have to stay over-provisioned for an entire year only to serve the highest demand periods.

Hybrid clouds provide both a blessing and a curse in terms of control. You’ll get a greater sense of control over the infrastructure due to your ability to manage hardware for the private cloud components. But you’ll also have to hire a team of people to manage the infrastructure, and divert resources that could be more effectively spent on innovating and improving your application. You’ll also have to continually replace the underlying hardware every several years — something you don’t have to worry about with cloud infrastructure.

Hybrid clouds can often be a good solution for companies who have invested heavily into hardware and are looking for burst capacity, or who wish to start transitioning to the cloud. If the hardware is compatible with a number of private cloud deployments, it can be leveraged into the hybrid cloud operational model.

5. Use multiple clouds, regions and availability zones — the ultimate in resilient cloud computing

A multi-cloud solution is the most robust (and most expensive) solution to secure your cloud infrastructure and application. It is the use of multiple clouds (most notably public and/or virtual private IaaS clouds like AWS, cloud.ca, or Azure) for load balancing, disaster recovery, and rapid failover in the case of an outage in any one cloud or region.

The benefits of multi-cloud are many:

High availability and redundancy across multiple clouds and systems means that your application is built to withstand failure in any one cloud IaaS, even if it’s region-wide or a global failure.
Geographic redundancy — when using availability zones in multiple regions and multiple clouds — provides business continuity and protection against large-scale disasters in one region or cloud-wide system failures.
The effort required to build your application in multiple clouds means you’ll have a standardized approach to the use of cloud infrastructure, allowing for portability across multiple clouds. You can remain cloud-agnostic in your operational approach.
Avoid single-vendor lock-in (and over-reliance on one vendor’s technologies)[11]. Systems will be in-place to easily switch to the cloud that provides better performance, at a lower price, and with more availability.
Multi-cloud allows you to meet jurisdictional requirements for data storage in certain countries, states, and provinces.
You will have a greater sense of control over your own destiny in the cloud.

An example of multi-cloud computing is Peerio’s use of cloud.ca and AWS to meet their availability and jurisdictional requirements.

While multi-cloud is not yet provided by a single technology solution[12], it can be delivered as an operational recipe, built and managed by an experienced team who will customize the recipe for the application’s workload. Teams like CloudOps can help you to build and configure your application to work with multiple clouds, and support your application platform while on it.

Which You Choose Depends on Your Business Needs

Ultimately, which solution(s) you’ll choose depend on your budget and the business case for building resiliency into your cloud application. For businesses where every minute, every view, and every transaction is important, you’ll want to combine robust infrastructure with some sort of multi-AZ or multi-region application. If you’re very forward-thinking and want to build for the future — to control your own destiny in the cloud — you should look at how to build your application to work with multiple clouds. The added benefit is that this allows for easy portability and further redundancy than simply relying on one cloud provider’s system.

Follow us on Twitter and LinkedIn to receive all the latest updates from the cloud.ca blog!

If you're interested in testing out our highly available virtual private cloud, you can sign up for a free 7-day trial of cloud.ca today!

Marc Cavage wrote an article about the challenges of building distributed systems. ↩
A virtual private cloud can be thought of as a “gated community cloud” where only those customers who are properly vetted and who’s resource requirements are understood are included in the cloud’s customer-base. Anyone cannot just sign up with a credit card and remain anonymous to the sales team. This allows the cloud provider to protect against hackers, spammers, and other people who might be misusing the cloud infrastructure, causing problems for the other users. cloud.ca provides “The control of a private cloud with the economics of a public cloud”. ↩
Justin Warren explains this well in his post on Resilient Apps or Hardware?: A DevOps Conundrum. ↩
CloudOps, a cloud computing consultancy and managed services provider in Montreal, has written a white paper titled The Road to SaaS, where they explain the imperative for traditional ISV vendors to move to a Software-as-a-Service model of delivery. You can download it here. CloudOps provides consulting services for putting companies in the cloud. The team has helped many ISVs with a successful transition to SaaS, including Taleo, Silanis, and Tecsys. ↩
Wikipedia entry for Disaster recovery ↩
Network World was talking about how multiple availability zones in an AWS region can fail back in 2011. ↩
I go over this in my previous post: What Amazon’s Recent Service Disruption Means for Cloud Computing. ↩
See Netflix’ Open Source Software Center website. ↩
“CloudOps began with the question, How do you get the best balance of public cloud and private cloud? Our answer was, Own the base, rent the peaks.” - Ian Rae, Founder and CEO, CloudOps ↩
They do, however, require a much larger capital expenditure up-front, and are best suited for organizations who operate on a CAPEX model for IT spending, as opposed to OPEX. ↩
The Economist writes that “[firms] that use more than one cloud provider to host their data are less vulnerable [to the risks of vendor lock-in]”. ↩
Some notable tools that are available to help with multi-cloud deployments include Docker and Kubernetes. Docker is a very specific way to run your software, in terms of microservices. The goal is to have a reasonable expectation of deploying a Dockerized application to any cloud. Kubernetes provides container orchestration, allowing you to manage a cluster of Linux containers as a system, to accelerate development and simplify operations. ↩