Learn About Recovery and Decision-Making 
in Cybersecurity Incident Response

Welcome to the fifth and final episode of our Energy Talks miniseries titled, Why Should You Talk About Incident Response? Join OMICRON cybersecurity consultant Simon Rommer as he explores the different process steps involved in cybersecurity incident response alongside other experts from the electrical power and cybersecurity industries.

In this episode, Simon discusses the critical cybersecurity incident response steps of recovery and decision-making, as well as lessons learned, with his guest Dr. Marie Moe, who is a Principal Security Consultant based in the United Kingdom at Mandiant Consulting, which is part of Google Cloud as a global provider of threat intelligence solutions, cybersecurity services and training.

Simon and Marie emphasize the importance of thorough incident investigation, the role of business decisions in recovery, and the need for continuous improvement in incident response processes.

They also discuss the necessity of collaboration between IT and OT teams, the significance of post-incident reviews, and the proactive measures organizations should take to prepare for future incidents.

If you haven’t already listened to Part 1 through Part 4 of this miniseries, be sure to check them out:

#85: Why Should You Talk About Incident Response? | Part 1 - OMICRON

#95: Why Should You Talk About Incident Response? | Part 2 - OMICRON

#97: Why Should You Talk About Incident Response? | Part 3 - OMICRON

#101: Why Should You Talk About Incident Response? | Part 4 - OMICRON

Listen to Our Episode
Dr Marie Moe, Mandiant Consulting
quote

“A detailed documentation of lessons learned is vital for improving future incident response efforts.”

Dr. Marie Moe

Principal Security Consultant - Mandiant Consulting

Here Are The Key Topics from This Episode

1. Recovery and Resilience: In the final episode of the Energy Talk Cybersecurity Miniseries, Dr. Marie Moe emphasizes the complexity of determining when an incident investigation is truly complete, highlighting the risks of premature conclusions.

2. Crisis as a Catalyst for Improvement: Cyber incidents can serve as unexpected opportunities to implement long-overdue system improvements and hardening measures. As recovery concludes, it's essential to support the people involved, conduct thorough debriefs, and document lessons learned to enhance future incident response.

3. Strengthening Readiness: After a cyber incident, organizations must go beyond restoring systems by addressing communication gaps, and updating critical response plans. Marie emphasizes the importance of cross-team collaboration, and leveraging the momentum to invest in long-term cybersecurity improvements.

4. A Mistake That Revealed Everything: Marie Moe shares a humorous real-life story where a threat actor accidentally sent their own malware creation tool instead of the intended payload, giving analysts unexpected insight into the attack’s origin.

Scott: Hello everyone! My name is Scott Williams from the podcast team at OMICRON. This is the last episode in our Energy Talks podcast miniseries about cybersecurity titled, "Why Should You Talk about Incident Response?” Your host of this miniseries is Simon Rommer, who is an OT Security Consultant in the OMICRON Power Utility Communications Team. Simon is currently exploring the steps involved in the incident response process with his guests. So, without further delay, I hand over the microphone to Simon. Hi Simon!

Simon: Thank you, Scott. I welcome our listeners to this fourth and final episode in our Energy Talk Cybersecurity Miniseries, where we explore the critical role of IT and OT in power systems, cybersecurity, and discuss the steps involved with the incident response process according to SANS. In our previous episode, I spoke with Stefan Mikis, who is the head of managed security services at SEC Consult. We discussed the steps on the incident response process, containment, eradication, and we talked about recovery.

In this episode, my guest is Dr. Marie, who is a principal consultant in the EMEA Strategic Consultancy Service team at Mandiant. She provides advice to organizations across a wide range of industries with the goal of helping clients to transform the cybersecurity capabilities and improve incident readiness. Mandiant is part of Google Cloud and together they have a very good view on the cyber landscape globally through their various customers.

Simon: We will discuss the last part of the incident response process, recovery and lessons learned. Marie, welcome to this episode of Energy Talks about incident response.

Marie: Thank you for inviting me.

Simon: So in the last episode we talked a lot about decision makers and keeping the company running if possible. We also talked about restoring systems during the incident and the loop of containment, eradication and recovery. Let's start with a question of when to break out of this loop and deem the investigation conclusion.

Marie: Yeah, this can be a bit tricky. I would say it depends a lot on the nature of the incident and also you sort of have to find your level of confidence in whether you've completely managed to expel the threat actor from their access to your systems. And I've seen some type of, let's say espionage type of incident where the clients struggled for months.

Even years to be confirmed that they could trust their previously compromised systems to be recovered, to be cleared from all threat actor presence or even like potential access for the threat actor, like let's say web shells that they haven't found or any type of backdoors in systems. In some cases, the victim has to completely eradicate, rebuild their IT environments especially legacy environments to get to that kind of level of confidence. And sadly, I've also seen many incidents where the client kind of concluded the investigation prematurely and unfortunately had to deal with the threat actor regaining their access. Or when they're looking into further incidents, they come to the realization that the detractor had been there the whole time, had not been properly expelled, had a foothold ever since their last investigation. So, one thing that could help you prepare for getting to that point of confidence in your recovery is to be sure to spend enough resources on your thorough scoping and containment phase of your investigation that you probably discuss in earlier shows of this podcast. So you need to perform some thorough sweeps of your environment to see any, look for any, any threat actor presence, find their indicator of compromise. And in some cases, even like pivot on those indicators to do another sweep of the, of the entire environment several times as you go through the investigation…

Marie: …to make sure you haven't missed anything.

Simon: So, you mentioned some really nice things there, like some really interesting things. Basically what you say is you can never be 100 % sure. And if you're not quite sure and it's important enough, then you have to rebuild, right? In my past, I also had this case where we came to a attack and the attack has always been there, like for the past eight years or so.

Basically, you have to be confident, but who makes these decisions or how can you be confident enough?

Marie: Yeah, so ultimately, it's always a business decision to determine when you reach that level of confidence to know that your identification, your recovery is complete and it's time to return to business as usual. So for production environment for OT systems owners, I think the cost and downtime due to investigations will many times put a lot of pressure on the incident responders in this phase to quickly kind of conclude and make it so that you can shut. So, if things are shut down, that they can be turned on again. But on the other hand, you need to be sure because the cost can also be considerable if you make a wrong decision and you're not properly mitigating the incident. If it's affecting OT systems, of course, health and safety.

And also, environmental impact. should be like the first thing that is taken into consideration. So, it's important that when it comes to actually shutting down production facilities, this is something that the OT engineers will be in deciding much more than the cyber experts because they know what would the consequences be if we shut down these systems and consequences of turning things off in order to carry out those investigations. So at this time, if you're a system owner, it's really important to have your hired staff and also your hired experts to consult with to make a properly informed decision at this point. One thing I want to emphasize is it's super important to also have really close relationship to your third-party vendors, your OEMs, your integrators in this phase, because they would be like a crucial part of the process for you to recover, to rebuild systems and to have confidence in that recovery. So working closely with those third parties will be key in this phase.

Simon: This echoes very nicely what we said in a previous episode of that the security people are just service contractors or like we provide services, and the real decisions have to be made by the engineers in the field and the crisis board where we are just one voice of many and that you have to consult other parties as well. So we are not the superheroes we like to be, but we are just one part of the team.

This is really nice because our listeners are mostly the engineers, and this is always my time when I say please if there is a security person needing your help. We are not the bad guys, we want to get rid of the bad guys. And this brings us to the next point.

Even though you can manually control a substation, it's always a business decision on how much money it costs versus normal operations. So if I may conclude what you said, it's basically a business decision and it's always coming down to risk versus cost.

But is this also the point where you start rebuilding the system and recovering the system because you said that you have to sometimes rebuild the whole system? Or is this a point where you're already in rebuilding everything and it's just the last step of saying, okay, we are done now and we go full throttle on recovery.

Marie: Yes, as you said, in some cases you really need to start from scratch again and do a rebuild, but not always. But if you're in that position, this could also be a great opportunity actually to perform some hardening of systems to make sure that systems are more resilient and more defensible for future attacks. So, looking into your visibility, any forms of monitoring tools, any logging coverage, making sure that your systems are properly patched. And even some cases you might want to at this point start to re-architect some systems because they have some, they might be legacy systems, or they might be lacking some built-in security mechanisms. So, you could sort of take this as an opportunity to do that type of hardening at this point when you're also having to rebuild systems anyway.

“Security people are just one voice on the crisis board—we’re not the superheroes, we’re part of the team.”

Simon: So, it's basically an involuntary mandated patch window or maintenance window if you want to say it like this.

Marie: Yeah, think of it as an opportunity. Yeah, see the chance to do the things that the work that you maybe have put off for way too long and you see the nature.

Simon: Hahaha. Especially in our industry, nothing is as permanent as a temporary solution. So maybe seize the chance as you said. But we also said, so we are one part of many voices or one part of the bigger team. We talked about the crisis board already in the last episode, we described the crisis board a little bit, what its roles are. But what is happening with the crisis board? Are they instantly dismissed? Is everybody going back to their daily business?

Marie: Yeah, so I think it's important to not only think about tools and technical systems here. I think you have to also think about the people that's been involved in incident response. They've been working overtime. They've been dealing with this cyber crisis for a while. They might be exhausted. They might even be a little bit traumatized and they will also need some type of closure to move on from the incident. So a good after action process should have a debrief. It should have a lessons learned or lessons identified first and then lessons learned, which is really important. And that should be documented. It should be followed up on so that you can then improve your, your incident response processes for future crisis.

Simon: So, you already talked about the topics that we need to discuss, but this is just the preparation for the next incident, right? Because there is just two types of companies, I like to say, is like ones that have been breached and ones that just didn't know yet. And if we look at the incident response process, the process is never ending, right? So it just feeds in back to the start. We talked about preparation for almost all the episodes. It's basically if you prepared well enough then the processes can go smoothly. So what you say is also that the last step is just preparation for the next incident, right?

Marie: Yeah, that's correct.

Simon: So is there some special processes that need to be discussed or need to be reviewed or how would you go about it in a real-life scenario? is there like a retrospective or are you collecting thoughts via email or how would you do that in a real-life scenario?

Marie: Yeah, you would, you would carry out an after action debrief retrospective or whatever you want to call it, a hot wash up maybe some kind of a process where you go through what went great during this incident response, what did not go so well, any gaps and so on. And you should use the outcome of this to improve upon your existing incident response plans, procedures. And you might have even detected that you're lacking some of those processes. So typical documents that you should review is the incident response plan, your business continuity plan, your communication plan, and maybe also your crisis management plan if the incident was escalated to a real crisis. And these should all be reviewed and updated based on your findings in that after action debrief.

“Never waste a good crisis. This is the time to focus on improvement, identify gaps, and be better prepared for the next one.”

Simon: So, the incident is over, we have done our due diligence in documentation. We have improved some processes, we have noted down some information, some mobile numbers or some technicalities and some persons that we need to contact the next time. But this doesn't mean that we are safe, right?

Marie: No, not always. I mean, there's lots of other things you can do and there's just, just to give you some example of possible pain points that I've commonly seen in organizations. when they discuss what went wrong, what, what didn't go so well during instance is like one possible pain point. I've seen a lot is the possible problems with handovers between teams and teams working together across possible silos in the organization with escalation paths and communication channels. Because these plans that I mentioned, they're often assigned assuming that you can use normal communication channels. However, you might be an incident where you suddenly don't trust your normal communication tools, like you don't trust your emails, you don't trust your company chat platform. And because of threat, active presence. And in those cases, you might need to have some alternative communication platform to switch over to. And that might not have been documented in the plans. So that's one possible thing to update, which I've seen a lot of organizations struggling with. Another thing I've seen, as I mentioned, there might be silos between teams. So often the incident response plan is created and updated by the cyber teams, while the communication plans, the BCP, the crisis management plans might be maintained by your crisis management teams. So that might make it harder to identify those correct escalation paths and see at which point is the cyber incident escalating to a real crisis where the whole crisis management organizations need to come in and handle the crisis. And also, the tension between teams that are maybe normally not working together, like IT and OT teams, which I'm sure you've discussed a lot in this podcast previously. You might talk to someone from the IT team, and they have a completely different view of how the network is looking. As if you talk to someone from the OT team, they might even sketch up completely different network diagrams, care about different types of systems.

Marie: And they might not agree on who's responsible for what in the incident response. So those are typical things that we can see are sort of pain points and things to work on improving after a real incident.

Simon: So, one of the exercises would also be write things like that down and be prepared for the next time when you have the same cooperation between the same teams again. And there is something called playbooks where the out of bound communication, for example, can be noted and the key personnel. But…

Simon: …what we have seen and, in our experience, especially the thing you mentioned with the communication between teams with different languages, even though both may speak German, English, Italian, whatever, but the IT language and the OT language are vastly different. But this kind of communication can only be improved through training and through cooperation. So, would you say that having a tabletop exercise or having regular exercises would improve this?

Marie: Yes, absolutely. So, you need to have a good feedback process in order to update your processes as we discussed. But yeah, also you need to look at people and processes, need to bring together the teams that needs to work together in an actual incident. Make sure that the lessons learned are shared across the teams, across the silos.

Also including those third parties, those OEMs, integrators, vendors into that discussion, I think is of importance. You can, for instance, look into updating your contracts, your SLAs. You might even think about onboarding new retainers so that you are properly prepared for any future incidents.

Simon: So, these are the steps that should be taken after the incident. But after all the systems are restored, the IEDs are configured, networks have been cleaned and sweeped, passwords have been reset, and what else needs to be done is what happens next.

Marie: I say never waste a good crisis. This is the time to use the momentum caused by the incident to focus on improvement, to identify those gaps, to be better prepared for the next one. You'll probably never have a better time than just after incident to actually get the CEO and the board on your side to invest in cybersecurity and incident response preparedness. So, examples of things you can do, like you mentioned is,

You could do security maturity assessments, tabletop exercises, purple team exercises, and you can invest in your personnel by training programs that are tailored for your teams and your team's needs when it comes to cybersecurity.

Simon: Purple team exercise is a really good keyword for our upcoming episodes where we're going to look at the offensive side of things. So purple team for our listeners is a mixture of red team and blue team. I'm part of the blue team. I guess Marie, you are also part of the blue team if I would. Yes. So we are the good guys. And there is the red team which are the supposedly bad guys but they're also good guys to help…

Marie: Yes, definitely.

Simon: …you out in the end. So, with this, is there anything you want to add to the whole process? So, we've talked about what to do after an incident and after the incident is before the incident. If you want to add something.

Marie : Yeah, you did mention that, so when the incident is over, we are safe now, right? Earlier, I don't know if I answered that properly, because I want to emphasize that there is a heightened risk of another incident, just when you think that the current one has been closed out. One possibility is that a threat actor has not had all their access eradicated as we discussed earlier. Another possibility is that other threat actors see that you have been compromised, and they might be inspired to target your organization based on that. Let's say that you, for instance, decided to pay a ransomware threat actor. That could single you out as an easy target or someone that has the willingness to pay to other ransomware threat actors as well. So it's definitely something to be wary of and due to this risk, spend extra time and resources on hardening and also maybe increase monitoring of systems for a while. A good amount of time actually after the incident has been closed off and that's one thing I would like to recommend.

Simon: Basically, the money is better spent in fixing your cyber security posture than paying some ransomware actor.

Marie: Yeah, I'm not going to comment on that, but just saying that if you pay, there's a risk you might be seen as an attractive target.

Simon: I would just ask you before the end if you can share some stories with us.

Marie: Yeah. I do have a story about the attribution, which might be fun. So there was one incident where, and there was a company that received several spare phishing emails with typical spare phishing emails with embedded malware in attachment. I think the attachments were typically PDF files that had some malware embedded into them. And we did have quite a lot of suspicion about who the threat actor was behind this. It was basically, this is quite a long time ago, but it was related to Chinese threat actors and industrial espionage. And so we did have quite a lot of indicators what threat actor it was.

What we did not know was the tool that they were using to create these spare phishing attachments. But then the traitors made a crucial mistake in one of those spare phishing emails, which actually led our analysts to the exact tool that was being used. So they basically used a tool, which had a nice interface where you could just combine a malware binary with a PDF document, and you could even choose how the icon was going to look like on your laptop when you downloaded the attachment. A problem however was the chat tagger instead of incorporating the malware into the PDF, they incorporated their tool which they used to create the malware.

Simon: So basically, you got the tool for free.

Marie: Our analysts got the tool for free, which was amazing. And the tool was in Chinese. It was pretty easy to understand that this was coming from a Chinese threat actor.

Simon: This is a really funny story, I was not expecting this.

Marie: This is from my previous role, and I worked at the Norwegian search team.

Simon: I think I've heard this story before somewhere in a talk or something. But the problem with attribution, and this is a whole different topic, is that it's always kind of political, right? So when I was working with the previous company and we were investigating a breach in how to say it without breaking my NDA, in a party that was closely related to our government, and we knew exactly who the adversary was, but officially,…

Marie: Yeah, it could be.

Simon: …attribution was not possible. So, yeah. But maybe we're going to talk about attribution in a future episode, which would be really nice to…

Marie: Yeah, this was just one of the more funny stories I was thinking of when you asked me for funny stories because that was really funny.

Have you listened yet?

Resources