Oh no, the system’s down! Where’s the emergency runbook?


In his blog post “What would happen if Godzilla visited your city?” my colleague Paul DiMarzio discussed how city operations can benefit from IBM’s Smarter Cities initiative. Here I would like to look at what would happen if Godzilla visited your data center.

Temporarily unavailable!The impact of disasters on organizations increases as organizations become more interconnected, globally integrated and interdependent.

And it’s not just the organization itself; it’s the functioning of a broader ecosystem that may be interrupted when one organization in a chain fails.

Away with the disaster recovery runbook

Organizations are increasingly reviewing their business operations and transforming their business continuity assurance from a disaster recovery mode to a continuous availability setup.

  • Some organizations are no longer willing to take the risk of incomplete business recovery procedures that are tested annually and found incomplete annually.
  • Other organizations have reassessed the criticality of certain business functions and found that their business recovery procedures must be extended to cover functions previously earmarked as not business critical.
  • Supply chain partners and regulatory bodies define service requirements that imply (near) continuous operations, even in the case of major disaster situations.

Business continuityIt’s time for us to rethink business continuity and move toward continuous availability.

There are different IBM solutions to accommodate business continuity scenarios in line with the business impact of an outage. The IBM Geographically Dispersed Parallel Sysplex (GDPS) solution is the crown jewel in these developments to support the ultimate goal of continuous operations.

A unique continuous availability solution

Don’t even try to understand the acronym and regard it as a name—it’s too confusing. Instead, let me explain. GDPS is continuous availability solution for a business application. This solution takes advantage of IBM System z technologies and hence covers all business applications running on System z, but it can also span applications running on distributed servers.

By monitoring and controlling the applications running in your data center or data centers, and automating failure recovery for these applications, GDPS can move entire business application workloads across data centers in the case of failures or disasters. And it can do that with the minimum effect on application and business availability.

How does GDPS do it?

This all sounds great on paper, but I’m sure you wonder if that is a dream or if it is real. It is real. GDPS brings together a number of technologies and adds intelligence to these bare technologies.

To explain this we must become technical. Here are some of the technologies:

  • Remote copy technology is a function in modern storage solutions that can copy data in near real time to a remote location.
  • With that comes a freeze operation, in which a storage system blocks input/output (I/O) from the host system to the affected volumes on the primary storage system on a site. A freeze operation stops mirroring updates between the primary and secondary volumes to ensure data consistency in the secondary subsystem or site.
  • The next is IBM HyperSwap, a function allows a server (or cluster of servers) to transparently switch the I/O operations of the applicant to the “mirror copy” without affecting the active applications.

GDPSThe intelligence layer on top of these technologies is what the GDPS offering is about.

Through its advanced software and automation software, GDPS automates failure recovery. In the case of failure signals from the underlying hardware or software stack, GDPS takes action. GDPS has the intelligence built in to determine what action to take in case of a failure.

This could go as far as switching the complete set of active workloads to another data center site.

Capabilities that improve operational efficiency

GDPS automates operational tasks. A spin-off advantage of implementing GDPS is that advanced facilities for simplified operations become available. In the day-to-day operations of the data center these are very useful and further reduce the chance of human failures.

For planned maintenance operations, like hardware or (application) software upgrades or testing, GDPS functions can be exploited to reduce risk and avoid application downtime. For example, these might include:

  • Starting and stopping individual workloads
  • Switching a workload from one site to another
  • Performing planned site switches such as switching all workloads executing to another site
  • Controlling entire systems (starting, stopping and so on) from a single screen

Matching business impact with continuous availability solutions

GDPS comes in several forms that match the sophistication of the continuity solution (and hence the price) with the business impact of an outage. These different solutions also take into account the data center layout (one or more data centers), the distance between the data centers and your specific recovery time and recovery point objectives. 


So if you have one data center and you want to guard your applications against infrastructure failures within the data center, or if you have three data centers that are 500 kilometers apart and you tolerate no data loss and application unavailability, GDPS can provide a solution.

And as it is said, it can be extended to manage your distributed workloads running in clusters on, among others, AIX, Linux, HP-UX and Windows.

Find more details on GDPS in the IBM System z site. You can also tweet any comments or questions to me @NdeGreef1.

Niek de Greef is an Executive IT Architect working for IBM in The Netherlands. Niek has more than 20 years of experience in IT. His areas of expertise include technology strategy, enterprise architecture, application integration, software engineering, and infrastructure architecture. You can reach him on Twitter: @NdeGreef1.

Redbooks Thought Leader

Smarter Computing Analyst Paper - HurwitzTo effectively compete in today’s changing world, it is essential that companies leverage innovative technology to differentiate from competitors. Learn how you can do that and more in the Smarter Computing Analyst Paper from Hurwitz and Associates.

Subscribe to the Smarter Computing Blog

Recent Posts

Creating a fast-track for the hybrid cloud

Setareh Mehrabanzad

Last month, IBM Systems unveiled new solutions for creating an agile hybrid cloud architecture by enabling VMware’s vRealize Automation Platform for IBM Power Systems and IBM z Systems. Today, IBM Systems and VMware are introducing expanded capabilities this week at VMworld 2015 Europe in Barcelona.

Continue reading

IBM plans to expand storage capabilities for cloud and unstructured content with Cleversafe

Jamie Thomas

IBM has announced a definitive agreement to acquire Cleversafe Inc., a leading developer and manufacturer of object-based storage software and appliances. Read more about Cleversafe and how it will complement IBM Storage.

Continue reading

2 Responses to Oh no, the system’s down! Where’s the emergency runbook?

  1. Be Domeinregistratie says:

    When copying the data, the hard disk gets disconnected automatically, and you cannot copy the data. This happens quite often with me.

  2. Harddrive Recovery says:

    Good share! An extensive familiarity on the field of electronics and computer hardware can define the root problem accurately. Most of the time computer will inform us if there is something wrong with it.

Leave a Reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>