Decisions Blog

Decisions

Through earthquakes, unplanned outages, and grid failures: Keep your applications running

On October 4th, Facebook experienced a massive service outage that took over eight hours to resolve. If a technology company as sophisticated as Facebook can experience such an event, then any business can. 

While Facebook’s issue was not a disaster recovery (DR) scenario, it forces the hard questions: 

  • How well are your business processes capable of surviving a disaster scenario? 
  • Do you have a clear list of Critical vs Non-Critical systems and services? 
  • Do you know how often your teams test Disaster Recovery Readiness? 
  • What are those uptime Availability Percentages in minutes, hours and days each year?

Below is a table that shows you how much downtime you should expect given a specific uptime guarantee percentage:

Availability Percentage

 

Approx. Downtime Per Year

95%

 

18 days

99%

 

4 days

99.9%

 

9 hours

99.99%

 

1 hour

99.999%

 

5 minutes

 

The cloud doesn’t mean you’re safe

Proper disaster recovery planning and testing are critical to ensure your business technology continuity.  Even if your application is hosted by one of the major cloud providers, that does not guarantee you are 100% safe from disaster events. 

In reality, using services like Microsoft Azure just means your data is deployed on physical machines in Microsoft data warehouses spread across the world. These buildings are just as susceptible to earthquakes or other major events.

 

Then how can you stay safe from disaster?

How do you protect your applications and services from costly interruptions when access to this data is critical for business continuity and reputation? How do you avoid going facedown like Facebook? 

Redundancy! 

Backups and copies of your data available for immediate or near immediate re-deployment when the inevitable failure occurs (because it is not a matter of if, but when.)

You can create this redundancy by having servers available in a model we call High Availability (HA), or a “hot” backup available immediately, and through Disaster Recovery (DR), or a “cold” backup that can be spun up as soon as a failure is identified.

HA protection

HA protects your users from service interruption:

  1. Prevents temporary user interruption during maintenance windows.
  2. Handles device level failure.
  3. Service Restoration timed in seconds or minutes.

In Decisions, HA is allowed through the use of an Enterprise license and is configurable by having two or more Production Servers running together in a cluster. These are often located in the same data center or cloud region. These servers run behind a load balancer and are pointed at the same database.

DR protection 

DR protects you from externalities out of your control

  1. Actual Disaster Scenarios (Earthquakes, Tsunamis, Conflicts.)
  2. Utility/Grid failures.
  3. Service Restoration timed in minutes, hours or days.

DR  capability is allowed through the use of a DR license. This is a Decisions server that is fully configured and sits in an offline mode until called to be active. This server is located in a different geographic/cloud region than the production server(s).  It is likely installed against a read-only replica of the product server’s database which is also hosted discreetly.  

 

By leveraging additional servers in your production environment, you can better ensure critical business applications stay online, minimizing failure risk, ensuring continued revenue stream and keeping your company’s reputation pristine. 

Interested in more information about High Availability or Disaster Recovery servers? Contact us.