Cyber Resilience in Financial Services

Severe but Plausible:  By 2030 all financial institutions in the US are fully cyber resilient, meaning that they are prepared for a severe but plausible attack that will knock out all operational capabilities, and they will continue to support their customers in spite of it.

Because of my involvement in Sheltered Harbor, I'm often asked to discuss resilience in financial services. I'm at the stage in my life where I have a lot to reflect upon. So, when I was asked to be a Keynote presenter to a financial services resilience conference hosted by Dell and AWS, I thought: "This is like preaching to the choir." I've been around long enough to remember the days when this industry pioneered the use of 'Non-Stop Computing' (and requestor-server technology, by the way) using Tandem computers, new-fangled personal computers and all that followed.  Resilience was the name of the game then, like it is now.

Okay. Now that I dated myself, let me advance the calendar to 2030. Yes, I'll skip today and then come back. And since I am an optimist at heart, I'll describe the 2030 where we've survived today's perils. 

I predict that by 2030 all financial institutions in the US are fully cyber resilient. By that I mean that they are all prepared for a devastating attack that will knock out all operational capabilities, and they know that they will continue to support their customers in spite of it. (My timing may be off by a few years, but I think we can get there. I'm an optimist. Actually, we must get there.) The current dynamic of whack-a-mole is unsustainable. I want to note that in my view of 2030, today's version of Sheltered Harbor will no longer be necessary. (Let me explain: I've been involved with Sheltered Harbor since we came up with the trusted approach that allows us to maintain public confidence even after a dire cyber-attack wipes out the operational capabilities of a bank, credit union or brokerage. That was back in 2016/17. As a not-for-profit industry initiative, we have been refining the approach and educating the entire industry and the ecosystem that supports it ever since.)

Even in the early days, it occurred to me that Sheltered Harbor was pioneering the development of a new paradigm for dealing with a disaster of a different type and magnitude, one which is not addressed well-enough by our current DR/BCP approaches. Internally, I thought, "we are creating DR 3.0". In my 2030 view, we think differently. In 2030 we all know that cyber is a very real (if not the most significant) risk to our operations, and so we have changed our thinking to incorporate cyber resilience in our planning. We have moved all of our critical operational systems to zero-trust cloud architectures, which can be inherently more resilient. 

In Carlos' 2030, any system can be re-instantiated in a new zero-trust environment in a matter of minutes. AND, copies of all critical data are held in environments that are isolated from normal operations, and therefore not subject to ransomware and related risks.

Yes, this is a simplistic - some might say naïve view. But, aside from the timeline which I admit may be a little aggressive, it is where the world is going - predominantly because of the costs involved in maintaining legacy systems. My vision assumes that we will expedite our progress, because it gives us the extra benefit of cyber-resilience, which is much needed and invaluable.

Getting back to surviving today's perils:

We have a lot of resources dedicated to cyber security, because this is and has been the financial industry's model forever. We protect customers' assets by protecting the institution. We're Very Good At It. As a private industry, none is better at protecting data than financial services. Of course, we also know that when it comes to protecting the institution and the assets, there is no rest. So, we keep investing in cyber security because the adversaries always have an incentive to get smarter. Those customer assets have real value! We keep whacking that mole.

I have been in data protection and implementing cryptography since the early 80's. The bad actors have gotten so much smarter since then. Even the most advanced technologies don't stop them (because our adversaries always find the weakest link and exploit it). I consider myself blessed that in 2016 I got the opportunity to join a small group of industry experts who recognized that we needed to switch hats. Not from white to black or blue or red, but to a whole different side of the game. I have been leading Sheltered Harbor's efforts to define cyber resilience so that we - the financial industry - could maintain public confidence - even after a completely debilitating attack on a bank, broker or credit union. This work has been enlightening, challenging, and rewarding. Along the way, we have learned what it means to be cyber resilient, and what needs to happen for an institution to survive an extreme event. (It takes me back to my early days when we were learning about redundant disk arrays, fault-tolerant operating systems, and fail-over capable transaction processing. And wow! Did things get really good when we started running hot-hot data centers and cross-regional application failover. There were a lot of new details that had to be worked out for all of this to become the normal way to architect resilient enterprise systems. That was resilience then …)  Ironically, hot-hot is now an Achilles heel that makes us highly vulnerable on the cyber front.

CYBER RESILIENCE TODAY

So what does cyber resilience mean? And how can we achieve it?

According to the industry's learnings from the Sheltered Harbor initiative: To be resilient for a severe but plausible cyber attack, a financial institution needs at least three things:

  1. To have complete confidence that we will never lose our data - even to those adversaries who want to deny us access.
  2. An assured way to continue operations very shortly after we have lost our operational systems. I'm talking minutes or hours - NOT days or weeks.
  3. A way by which all of our stakeholders, counterparties, regulators, etc., can have the same confidence that we do in our ability to recover from a crisis, so that they will continue to trust us - even in this seemingly dire scenario.

CONFIDENCE IN AVAILABILITY OF OUR DATA

For confidence in the availability of our data, we have learned how to protect an isolated copy of our critical data using a mature data vaulting process that ensures the security and integrity of the original data, and we have educated the broader ecosystem on how to enable that process in their data protection products. Highest in that learning has been the five characteristics that are necessary for a vault to be cyber-resilient:

  1. Secure
  2. Isolated
  3. Immutable
  4. Survivable and Accessible
  5. Distributed

Any vault that is missing one or more of these attributes is not truly cyber-resilient. Equally important to the vault are the processes to ensure that the vault will hold exactly what we need on the worst of days. After all, if we ever need that data, it is because we have no better alternative. And in that case, failure is not an option. We have to have complete confidence in the contents of that vault.  So, the processes to put our most critical data into the vault as well as those to recover it from the vault must be proven, tested, and unimpeachable in order to ensure that the data is exactly as we intend it.

CONTINUITY OF OPERATIONS in a new production instance

To ensure continued operations, we have to re-institute an old approach. We need to be sure that an alternate platform, that is completely isolated from our operating plant, will be available to support our critical operations in very short order.  In the old days, we used call this the cold data center. With today's cloud-enabled architectures, it is possible to initialize a complete instance or many instances of a whole system from the ground up within minutes. But that by itself is not enough. We have to prepare our people (and possibly have alternate processes) so that when the balloon goes up - when we sound General Quarters - everyone knows exactly what they must do in order to support our most critical functions in spite of a successful attack. In short, we must prepare for successful cyber attacks - in addition to continuing to protect against such events. Today's Business Continuity and Disaster Recovery Plans are generally not designed to react at the pace needed to survive a cyber event that could come on suddenly, and require too much rebuilding to allow for a quick recovery of critical functions, such as being able to communicate directly with our customers.

(In a subsequent article I'll walk you through the difference between a traditional DR/BCP approach and one that is focused specifically on quick recovery of critical customer services during a devastating cyber attack.)

MAINTAINING TRUST

The last part of the resilience preparation triad provides us all the confidence to trust in the stricken institution's ability to survive. As a result of this trust, we can comfortably provide whatever support has been previously planned for this scenario. So, how do we establish and maintain this confidence? By defining standards for this alternate scenario, and including clear verification frameworks that can be reviewed, tested and validated - as Sheltered Harbor has accomplished with its certification frameworks. By providing a method for independent verification of adherence to the standards, the industry can self-police with publicly visible validations. 

What's New? 

You may ask yourself, why don't we have this already? Isn't this why we have so many industry groups focusing on security and resilience? The simple reality is that we are still at the early stages of defining and implementing operational and cyber resilience. Our ecosystem of advisors, auditors, vendors and even regulators, are learning, as is the rest of the sector. As a sector, we are leading other industries, but we have a way to go yet. We didn't have true high-availability distributed systems until the mid 1980s, and it took the industry a few years to leverage them fully. We didn't have the risks then that we have today - particularly with the severity and expediency of a cyber-attack. Our current business continuity plans are generally not designed to recover from an extreme outage instantly (or even within a few hours). We are still learning how to do this at scale, and still building the ecosystem that will support comprehensive, cost-efficient cyber resilience across the sector.

Not all of our systems are architected to live completely in code - actually very few are. And for the most part, our most critical systems don't exist in a zero-trust cloud architecture today. So, our ability to fully standup a new instance of our operating plant is still hampered by long recovery times. Similarly, few of our institutions are far enough down the path of defining their resilience plans - never mind getting our people up to speed with how to react. However, with collective efforts such as those by Sheltered Harbor and related initiatives, we are getting closer every day. The paths have been paved, and now we have to get everyone on the road and up to speed. I'm hoping that by 2030, we should be fully there.

 

 

 

Carlos Recalde - President

Carlos Recalde, President & CEO
Insights into resilience against severe but plausible events, as defined by leading U.S. financial firms