Oh no, the system’s down! Where’s the emergency runbook?
In his blog post “What would happen if Godzilla visited your city?” my colleague Paul DiMarzio discussed how city operations can benefit from IBM’s Smarter Cities initiative. Here I would like to look at what would happen if Godzilla visited your data center.
The impact of disasters on organizations increases as organizations become more interconnected, globally integrated and interdependent.
And it’s not just the organization itself; it’s the functioning of a broader ecosystem that may be interrupted when one organization in a chain fails.
Away with the disaster recovery runbook
Organizations are increasingly reviewing their business operations and transforming their business continuity assurance from a disaster recovery mode to a continuous availability setup.
- Some organizations are no longer willing to take the risk of incomplete business recovery procedures that are tested annually and found incomplete annually.
- Other organizations have reassessed the criticality of certain business functions and found that their business recovery procedures must be extended to cover functions previously earmarked as not business critical.
- Supply chain partners and regulatory bodies define service requirements that imply (near) continuous operations, even in the case of major disaster situations.
It’s time for us to rethink business continuity and move toward continuous availability.
There are different IBM solutions to accommodate business continuity scenarios in line with the business impact of an outage. The IBM Geographically Dispersed Parallel Sysplex (GDPS) solution is the crown jewel in these developments to support the ultimate goal of continuous operations.
A unique continuous availability solution
Don’t even try to understand the acronym and regard it as a name—it’s too confusing. Instead, let me explain. GDPS is continuous availability solution for a business application. This solution takes advantage of IBM System z technologies and hence covers all business applications running on System z, but it can also span applications running on distributed servers.
By monitoring and controlling the applications running in your data center or data centers, and automating failure recovery for these applications, GDPS can move entire business application workloads across data centers in the case of failures or disasters. And it can do that with the minimum effect on application and business availability.
How does GDPS do it?
This all sounds great on paper, but I’m sure you wonder if that is a dream or if it is real. It is real. GDPS brings together a number of technologies and adds intelligence to these bare technologies.
To explain this we must become technical. Here are some of the technologies:
- Remote copy technology is a function in modern storage solutions that can copy data in near real time to a remote location.
- With that comes a freeze operation, in which a storage system blocks input/output (I/O) from the host system to the affected volumes on the primary storage system on a site. A freeze operation stops mirroring updates between the primary and secondary volumes to ensure data consistency in the secondary subsystem or site.
- The next is IBM HyperSwap, a function allows a server (or cluster of servers) to transparently switch the I/O operations of the applicant to the “mirror copy” without affecting the active applications.
The intelligence layer on top of these technologies is what the GDPS offering is about.
Through its advanced software and automation software, GDPS automates failure recovery. In the case of failure signals from the underlying hardware or software stack, GDPS takes action. GDPS has the intelligence built in to determine what action to take in case of a failure.
This could go as far as switching the complete set of active workloads to another data center site.
Capabilities that improve operational efficiency
GDPS automates operational tasks. A spin-off advantage of implementing GDPS is that advanced facilities for simplified operations become available. In the day-to-day operations of the data center these are very useful and further reduce the chance of human failures.
For planned maintenance operations, like hardware or (application) software upgrades or testing, GDPS functions can be exploited to reduce risk and avoid application downtime. For example, these might include:
- Starting and stopping individual workloads
- Switching a workload from one site to another
- Performing planned site switches such as switching all workloads executing to another site
- Controlling entire systems (starting, stopping and so on) from a single screen
Matching business impact with continuous availability solutions
GDPS comes in several forms that match the sophistication of the continuity solution (and hence the price) with the business impact of an outage. These different solutions also take into account the data center layout (one or more data centers), the distance between the data centers and your specific recovery time and recovery point objectives.
So if you have one data center and you want to guard your applications against infrastructure failures within the data center, or if you have three data centers that are 500 kilometers apart and you tolerate no data loss and application unavailability, GDPS can provide a solution.
And as it is said, it can be extended to manage your distributed workloads running in clusters on, among others, AIX, Linux, HP-UX and Windows.
Niek de Greef is an Executive IT Architect working for IBM in The Netherlands. Niek has more than 20 years of experience in IT. His areas of expertise include technology strategy, enterprise architecture, application integration, software engineering, and infrastructure architecture. You can reach him on Twitter: @NdeGreef1.