Chances & Risks of Operational Technology (OT) in the Energy and Aviation Sectors

Welcome to the 6th episode of our Energy Talks miniseries, Cybersecurity in the Power Grid, in which we take a 360-degree look at how power grids can best protect their infrastructures from cyber attacks.

In this episode, Andreas Klien, OMICRON cybersecurity expert and Business Area Manager for Power Utility Communication, discusses the differences and similarities of using OT technology in the aviation and electrical power industry sectors with his guest, Ron Brash, a renowned vulnerability researcher in the aviation industry and Vice President of Technical Research and Integration at aDolus in British Columbia, Canada. 

Andreas and Ron exchange stories about when things have gone wrong from their many years of experience in the aviation and electrical power industries. They highlight the contributing factors that have led to or nearly caused breaches in cybersecurity from both the OT and IT perspectives and exchange useful strategies in their related industry to help others reduce risk.

Listen to the podcast episode
quote

“If you know that you have OT devices with vulnerable agent components, you will have to revise your risk management strategy to include defense at multiple layers.”

Ron Brash

Vice President of Technical Research and Integration, aDolus

Here Are the Key Topics from This Episode

Aviation and Power Grid Cybersecurity: Andreas Klien and Ron Brash set the stage for exploring the similarities and differences between the two industries, laying the foundation for a deeper dive into cybersecurity challenges and best practices.

Navigating the Complexities of Supply Chain Risks: They discuss the multifaceted nature of supply chain attacks, emphasizing the challenges in identifying and mitigating risks. They explore scenarios ranging from software infections to hardware vulnerabilities, highlighting the need for visibility, actionability, and a shift away from vendor dependence.

Network Structures in Aviation and Power Grids: Our experts follow a suitcase through the OT of an airport to give an overview on the structure of these OT networks and how interconnected they are. Then, they compare them to typical power grid OT networks from SCADA down to plant.

Vulnerabilities and Risks in Legacy OT Systems: This chapter focuses on the vulnerabilities and risks associated with legacy operational technology (OT) systems. Andreas and Ron examine how errata documents can be a good way to find disguised vulnerabilities in devices.

Scott Williams, Introduction Host Welcome to Energy Talks, a regular podcast series with expert discussions on topics related to power system testing, data management, and cyber security in the power industry. 

Hello everyone! My name is Scott Williams from the podcast team at OMICRON. This is the sixth episode of our special Energy Talks miniseries called “Cyber Security and the Power Grid,” in which we provide you with a 360-degree view of how power grids can best safeguard their infrastructures from cyber-attacks. In this episode, OMICRON cybersecurity expert, Andreas Klien, will be your host, and a special guest will join him to discuss the chances and risks of operational technology, or OT, in the energy and aviation sectors. So, without further delay, welcome Andreas, and thank you for hosting this episode. 

Andreas Klien, Interview Host Thanks Scott! Hello everyone and welcome to the sixth episode of our Energy Talks cybersecurity miniseries. My name is Andreas Klien and I will be your host for this episode about aviation cybersecurity and the crossover to power grid cybersecurity and what we can learn from each other. I have been focused on power grid communications for almost 19 years. For 14 of those years, I've focused on power grid cybersecurity, and I'm responsible for leading the Power Utility Communications business unit at OMICRON. Joining me for this episode is the renowned vulnerability researcher and firmware dissector, especially OT and critical infrastructure firmware, device firmware, as well as the Vice President of Technical Research and Integration at the company aDolus and 40 under 40 Engineering Leaders Award winner Ron Brash

Ron Brash, Expert Guest Thank you, Andreas, for having me. It's great to be in respected company. Those are really kind words, but I think the real pragmatic engineer is actually not me. It's Andreas. At least in all of our conversations.

Andreas Klien Thank you so much. It's an honor to have you here. I just recently learned that you've received the 40 under 40 Engineering Leaders award. That's very impressive. When did you receive this award?

Ron Brash I think it happened during the dark years of Covid. I never fully figured out who nominated me. But truthfully, it's a pretty humbling award. Especially when I'm not an engineer by trade and I am not able to use that in my country. But I believe it's more about being able to ask hard questions and go straight for the cause. And in some industries, this is kind of funny. Why? It's an award that's not exactly welcome, but here we are asking hard questions about energy, oil and gas, and aviation. So, let's do this.

Andreas Klien During our last encounter in Copenhagen at an industrial cybersecurity conference, we brainstormed the concept for this episode. Engaging in discussions about cybersecurity in aviation, you shared a plethora of intriguing stories. As someone with a keen interest in airplanes, I found myself captivated. It sparked the idea: why not delve into exchanging spine-chilling tales about cybersecurity in aviation and juxtapose them with similar scenarios in the power grid realm? Perhaps, through this exchange, we can glean valuable insights from each other's experiences.

Ron Brash Indeed, the aviation industry serves as a compelling example when exploring the nuances of terms like "safe" and "unsafe," particularly within the context of cybersecurity. While the theft of credit card data may be deemed unsafe in conventional cybersecurity discussions, the notion of "unsafe" takes on a significantly weightier meaning in aviation. Within the industrial cyber critical infrastructure sector, which encompasses aviation, there's a prevailing acknowledgment of the potential catastrophic consequences, often referred to metaphorically as "making craters." This perspective underscores the importance of exercising caution when applying terms like "unsafe" within the industrial and transportation domains, highlighting the distinct challenges and considerations inherent in these sectors compared to traditional cybersecurity contexts.

Clarifying the necessity to halt operations or ground a flying mobile operational technology (OT) site poses a unique challenge compared to explaining the unsafety of a cybersecurity event. The latter typically involves scenarios such as personal data breaches, ransomware attacks, and other threats capable of disrupting manufacturing sites – scenarios vastly different from the aviation industry's concept of 'unsafe'.

In aviation, the paramount concern revolves around 'airworthiness'. Delving into topics like 'airworthiness' and 'flight systems' on search engines like Google can be an overwhelming endeavor. In this field, assertions must be made with absolute certainty; it's not as straightforward as releasing a Common Vulnerabilities and Exposures (CVE) that might stir controversy. Such an approach simply doesn't translate to the realm of aircraft. Every decision must be executed with meticulous precision, a lesson underscored by recent incidents involving the Boeing 737.

One crucial consideration is the manufacturer's own uncertainty. It's concerning that two planes, constructed sequentially and equipped with identical parts and certified components, can exhibit starkly different behaviors. Even with the same pilot at the helm, subtle variations can manifest. Therefore, when discussing 'unsafe' within the cybersecurity context, all these variables must be taken into account. Each aircraft, or 'tail', generates distinct telemetry and undergoes unique retrofitting.

In the aviation sector, I've noticed that everything is bespoke, and you're navigating through intricate systems of systems. Take, for instance, a particular incident involving a heads-up display, perhaps around 2009 or 2013. This Rockwell Collins heads-up display was designed for default installation in an aircraft, with a second unit available as an option. Both were certified and deemed airworthy. However, when installed on a specific aircraft and Wi-Fi was enabled, both displays would inexplicably go blank mid-flight.

Andreas Klien Those cool glass screens we often see in Dreamliner cockpit pictures? 

Ron Brash Yes, exactly. 

Andreas Klien Are you referring to the overhead displays or the ones that comprise the dashboard?

Ron Brash The overhead display, but the dashboard as well. They typically feature a primary and secondary pilot setup, although in some cases, the secondary option might be added later, especially for airlines on a tighter budget or in developing countries.

Andreas Klien I remember that you consistently refer to airplanes as flying Operational Technology (OT) sites, which is an intriguing perspective. Viewing them as flying OT networks susceptible to attacks, particularly when grounded but even while airborne, adds a fascinating dimension to the discussion. What caught my attention was your observation that airplanes behave differently. I had assumed that airplanes, being mass-produced, would behave uniformly due to identical components. Could you elaborate on what you meant by this? Are there variations in components, or is there another factor at play?

Ron Brash That's a fantastic question! When you acquire an aircraft directly from the manufacturer, it typically adheres to a set of standard reference specifications governing its construction. These specifications detail crucial aspects, such as the inclusion of two Pratt & Whitney turbines and the selection of avionics, often sourced from companies like Honeywell, for functions like ground avoidance.

In theory, all the assembled components should align precisely with the bill of materials provided by the manufacturer. However, there's room for adaptations. Airlines can add additional code to existing systems, enhancing safety protocols or optimizing fuel efficiency. These adaptations are crucial for specific scenarios, such as flight and thrash logic.

What's intriguing is that when you purchase a plane directly from the manufacturer, its specifications may vary. Now, let's delve into some fascinating observations. While working on projects involving Dreamliners and Maxs, I've noticed that planes communicate differently. Imagine an additional entertainment system for in-flight use—this system might come from various suppliers like Talis or Panasonic. Over time, logs reveal intriguing messages, such as "running out of space" or "bad flash cell."

During retrofits, subtle changes occur. Legacy technology from different eras blends together, resulting in unique behaviors. And occasionally, something truly mysterious occurs—like a mysterious shell popping up in the logs. No one could explain why or how it happened.

Aircraft technology is a captivating mix of precision engineering, adaptability, and occasional surprises!

Andreas Klien Ah, that's an intriguing point you raised about loading code later by the airline and the mysterious shell popping up. I'm curious, does this phenomenon occur solely in non-critical entertainment system networks, or does it extend to deeper layers within the aircraft's systems?

Ron Brash In general, there's a model that you can likely find references to these days on the Internet. It consists of several layers and operates on a time-delimited bus. This approach involves cycles that occur every minute. If you're familiar with this concept from the Honeywell technology side, where they utilize similar strategies in airplanes, it becomes clearer. So, when you encounter these networks, it's apparent that you probably won't be able to communicate on them unless you understand how that protocol functions. However, there are these series of layers, akin to rings, and as you move closer to the avionics and critical systems, they become more isolated. They operate on a subscriber model, with several firewalls in between. But when it comes to the fly-by-wire systems, there are definitely some unique elements concealed within.

Andreas Klien So, what are the other scary things you saw in airplanes. 

Ron Brash Oh gosh, I think pointing back to the airworthiness of work kind of highlights my concerns. These days, I'm more afraid of RF attacks.

Andreas Klien RF stands for radio frequency? 

Ron Brash Yes, radio frequencies. You could reference Ruben Santa Marta, who's quite the radar dish. Although I was somewhat involved in reviewing that, some of it was true, and some of it wasn't at all. But that depended on each airplane and each airplane operator; those things all apply differently. But regarding RF, most planes are not built to be shielded from the interior; they're built to be shielded from the exterior. And shielding adds weight. So that, to me, is a concern every time. It's not so much like, "Oh, someone's going to go poisoning the entertainment." It's not that. It's when someone's going to do something bad and bring on a high-powered amplifier on a plane.

Andreas Klien So, especially in that moment when they tell you to turn off all electronic devices, that would be a good point to turn on your software defined radio, right? 

Ron Brash I used to live near a major airport, right along the landing path. On certain days, a peculiar type of cloud would appear—a heavy, green-hued mass. Now, I’m no meteorologist, but this cloud signaled trouble. As it loomed, all the planes faced critical decisions: divert to another airport, circle in a holding pattern until the storm subsided, or execute rapid landings.

Let’s delve into the heart of the matter. When pilots opt to land swiftly or take off urgently, they tread on the most perilous ground—the operational layer during these critical phases. It’s not just about the pilot and their typical avionics anymore. Instead, a complex web of communication channels comes into play. Imagine ultra-high frequency data, GPS coordinates, and VHF transmissions—all converging during takeoff and landing.

Picture this: a low cloud ceiling, visibility reduced to a mere sliver. The aircraft operates blindly, relying on precise instructions. But here’s where my anxiety spikes. What if someone with software-defined radio (SDR) mischief decides to place obstacles—say, balloons—along the flight path? These rogue objects hover dangerously close to the descending aircraft. The stakes are high: stressed pilots, adverse weather, and a procession of double-aisle jumbo jets lining up for touchdown.

Now, it’s not a straightforward cyber event. No, it’s more insidious—an intricate dance of decision-making under duress. The operators grapple with split-second choices, their nerves taut. In this delicate balance, a single misstep could spell disaster.

Andreas Klien Certainly, that's also a common attacker strategy: initiating a minor cyber-attack to confuse operators into making severe mistakes. It's something that's quite imaginable. Also, like all over OT in process control, there's always humans sitting in front of SCADA displays, and if they are reading the wrong numbers, they make the wrong decisions, right? 

Ron Brash Yeah. Or they'll make the correct, safe decision. You know, press the big red button and it is prematurely being pressed. 

Andreas Klien In any case, at the very least, it entails financial damage.

Ron Brash Yes. So, when I think about it … Andreas and I had a great conversation about this topic. I think it was when I met you about three years ago at an event, we were arguing. 

Andreas Klien In a panel discussion. Yeah. 

Ron Brash We were discussing how rail systems often utilize components like relays that are also found in other industries, not just rail. This raises questions about the extent to which these devices are shared across various industries such as aircraft, rail, and maritime shipping. Andreas, there's an intriguing anecdote here related to product divergence and attention allocation. In my experience, I've noticed instances where components originating from the industrial sector are repurposed under different names for various industries, potentially sharing a common code base. This can be quite unsettling. It makes me wonder: What unknowns exist in the realm of rail systems?

Andreas Klien Ies, in the aviation industry, automation components are indeed utilized, primarily in ground operations rather than directly on aircraft. These components stem from the automation sector, the largest in terms of volume, and are repurposed and rebranded for use in various industries.

For example, PLC (Programmable Logic Controller) components, originally crafted for automation purposes, find application in the power grid industry, particularly in substations for tasks like protection relays. These relays, often equipped with PLC modules, are then repurposed for railway systems. Despite being marketed under different names and sometimes different brands, these relays may essentially be identical products running the same firmware.

In the railway sector, further adaptation is necessary due to frequency variations across different railway networks. While the standard power grid operates at 50Hz or 60Hz, certain railway networks, like those in central and parts of Europe, operate at around 16.7Hz. This requires specialized products with similar firmware but possibly different configurations or slightly modified software, branded under different labels.

However, despite these adaptations, the underlying components often share vulnerabilities and face similar stresses. This raises concerns about whether all vulnerabilities are adequately addressed, particularly given the niche nature of these products within the broader industry landscape.

Ron Brash Absolutely, it's a bit unnerving when you encounter such scenarios. You might address an issue in one setting, yet it persists in another, which can be quite perplexing. Take, for instance, the case of a certain Line Replaceable Unit (LRU) on a Dreamliner. It happens to be identical to a significantly larger and more costly Distributed Control System (DCS) unit. Despite their similarity, you rarely come across a Common Vulnerabilities and Exposures (CVE) related to the avionics or its aviation equivalent.

Andreas Klien Are there CVE used for aviation devices? 

Ron Brash Not usually. No. They usually are released in airworthiness reports and quietly get hashed or fixed as a part of an SOP. 

Andreas Klien I see, addressing software issues in aviation must pose significant challenges. The stringent safety standards require thorough scrutiny of every line of code, almost to the point of mathematical certainty. Interestingly, there have been cases where software bugs were resolved not by updating the code but by modifying the hardware itself. This approach circumvents the need for costly software patches, underscoring the complexity and expense associated with ensuring software integrity in aviation systems.

Ron Brash In the field of medical devices, there exists a parallel challenge concerning the definition of changes. These devices often rely on software-defined functionalities, necessitating frequent updates to both software and hardware components. Interestingly, while software is treated as a tangible entity, complete with signatures stored for potential use on aircraft, the verification of these signatures is not consistently enforced.

The use of various programming languages in medical device development, including Rust, Ada, and their predecessors, introduces differing levels of scientific validation. However, concerns persist regarding the presence of "masquerading" features within the codebase, where unnecessary functionalities may inadvertently remain.

Certification presents another significant hurdle, particularly when a single firmware undergoes certification for multiple product variants or SKUs. This scenario can expose vulnerabilities, as demonstrated by recent cybersecurity incidents like the one observed in Denmark. For example, certain devices may come equipped with 3G Huawei modems despite these not being listed in the hardware Bill of Materials (BOM). Such discrepancies pose risks, especially in environments where specific companies are restricted, underscoring the critical need for comprehensive threat modeling.

Andreas Klien So, the future is locked in this USB basically. And they are not mentioned in any SBOM or anything? 

Ron Brash No, and what's concerning is that the drivers for these Huawei USB sticks are open source, meaning they're not developed by the company itself but by someone else. When you're compiling a Software Bill of Materials (SBOM), you'd typically just see "Linux kernel" listed. However, the actual driver logic code resides on the Huawei USB stick. While it's often programmable to suit different carriers, this setup poses a significant risk.

Consider this scenario: a company focuses on enhancing features for cellular communication, which can be beneficial for operators. For instance, a small company or town with limited resources might opt for a router with a backup USB LTE gateway as a cost-effective solution. However, the issue arises when such inexpensive features are prone to misuse, potentially compromising the entire system's security.

This dilemma extends beyond telecommunications and applies to various industries, including aviation and others. While there's valuable information available on Critical Infrastructure Protection (CIE), the most significant vulnerabilities often stem from human behavior and the pressures of capitalism. So, for those in the business of managing risks, the challenge lies in devising strategies to mitigate these threats effectively. And that, I suppose, leads us to an intriguing discussion.

"Capitalism is the root of all cyber security issues."

Andreas Klien So, the risk here that is causing the human error or the supply chain problem is capitalism, or what did you just say? 

Ron Brash Yeah, capitalism is the root of all cyber security issues. 

Andreas Klien Yeah, well, in a way that's true. It's also the opponent of all security measures, right? Because it's an investment in its time, and it doesn't really give you a lot of features if you're just improving the security. So essentially it's always the opponent of making things secure. 

Ron Brash Yes, the woe of woes. 

Andreas Klien Yeah. So, then another question which I have now that we're almost touching supply chain risks. How shall we identify risks in supply chains? You've been analyzing firmware for so many years now. How can we identify risks in the supply chain?

Ron Brash Supply chain attacks are a complex issue that raise important questions about where the attack originates and how far its impact extends. For instance, consider an attack where a Ukrainian accounting company's software gets infected and is then used as a distribution mechanism, similar to the SolarWinds incident. Is the supplier the target of the attack, or does the impact encompass the entire chain from start to finish?

This uncertainty makes it challenging to identify and mitigate risks associated with supply chain attacks. Different scenarios require different strategies. Is the problem hardware-related, software-related, or both? Is it a matter of knowledge transfer or distribution within the supply chain? These are all crucial considerations that lack a clear taxonomy, as noted by Eric Buyers in his insightful talk on the subject.

Commonly, people associate supply chain attacks with the manipulation of open-source components. But for asset owners, gaining visibility into such vulnerabilities and knowing how to respond can be daunting. Even if you're aware that a certain version of OpenSSL is vulnerable and under attack, determining the appropriate course of action isn't straightforward.

Nevertheless, there are steps you can take. Visibility is key, but it's equally important to have the capability to take action based on that visibility. This might involve revising your risk management strategy to incorporate defense-in-depth measures. While technological diversity can incur costs, it's often a necessary investment for resilience.

Dealing with supply chain risks also requires a shift away from total dependence on specific vendors. Opt for products with long lifespans and ensure that vendors consistently update all components. Treat supply chain risk as you would third-party risk, recognizing the interconnectedness of all relationships within the chain.

Moreover, navigating the complexities of the business world adds another layer of difficulty. Vendors may change names, and product rebranding can obscure their origins. These intricacies further underscore the challenge of managing supply chain risks effectively.

It's a tough question with no easy answers, but by understanding the complexities involved, we can begin to address these challenges more effectively.

Andreas Klien For our listeners who may not be deeply familiar with the aviation industry, could you provide a brief overview of the network structure of an Operational Technology (OT) network around an airport, particularly concerning the connectivity of aircraft to this network?

Ron Brash In a basic airport setup with just a couple of doors for passengers to access the tarmac, there are various interconnected systems at play. Picture this: as a passenger walks in, they encounter kiosks for checking in luggage and obtaining tickets. This initial step marks the beginning of the IT/OT convergence journey.

Once luggage is tagged, it enters a system of automated conveyor belts and image processing devices that guide it through the airport. Along the way, there are security measures like border control and x-ray scanners.

As the suitcase is loaded onto the aircraft, there's a transition where it passes through barcode scanners. Meanwhile, various systems onboard the aircraft communicate with air traffic management in the control tower.

Remarkably, details about a passenger's luggage on the IT side can impact the departure of the aircraft. This highlights the interconnectedness of systems in aviation.

During flight, the plane remains in communication with ground systems, and upon landing, similar interconnected systems facilitate the safe arrival and handling of passengers and luggage. Aviation stands out as one of the most interconnected and safety-critical industries, with a multitude of systems working together seamlessly.

Andreas Klien Interesting, in the power grid, networks are structured more like a tree, with the control center at the top overseeing operations at plants and substations. The control center interfaces with auxiliary networks for IT services and operational technology (OT). From the control center downward, connections follow a top-down approach, reaching Purdue Level 1 where controllers and protection relays manage equipment.

While the lower levels of the grid are less interconnected, there's significant interconnectivity between the control center and surrounding Distributed Monitoring and Control (DMC) systems. These DMCs handle tasks like smart meter data and device configuration files, necessitating data exchange even during fieldwork.

Transient devices, such as engineers' laptops, serve as important attack vectors at the lower levels of the grid. Despite less interconnectedness, there are more connections than often perceived. For instance, during the installation of intrusion detection systems, it's common to find multiple external TCP/IP connections to Purdue Level 2 or virtual Level 1 devices.

Teams often underestimate the number of connections present, with each team believing their connection is the only one. In reality, there are usually multiple connections, including those from the protection engineering team, SCADA team, and networking team managing switches.

In one notable instance, a large substation had over 60 external IP addresses with permanent connections to devices. This raised questions about necessity and security, particularly as vulnerabilities were discovered in an old sample server used for dropping configuration files directly into relay directories.

These vulnerabilities highlight the importance of thorough assessment and risk management, especially in critical infrastructure environments where the stakes are high.

Ron Brash Yeah, that's quite common. Whether it's Samba or other protocols, you'll often find similar vulnerabilities. Even older industrial equipment tends to have FTP drops or debuggers enabled as part of the board support packages. This extends to older relays and packs as well. This leaves them susceptible to malware or other malicious activities, especially when engineering software pulls them back into what's perceived as a more secure environment. This scenario is also prevalent with Distributed Control System (DCS) devices, where primary and secondary servers are Windows-based and may accept configurations without proper validation.

Andreas Klien FTP is widely recognized as an outdated and insecure protocol. Seeing an FTP server in use today often raises concerns, as it should ideally be replaced with a more secure alternative like SFTP. However, even protocols native to Operational Technology (OT) environments can pose significant risks, sometimes more so than an unpatched or even patched FTP server.

Take, for example, the 61850 RMS protocol, dating back to the 1970s or 1980s. Originally an OSI protocol, it was later adapted to TCP/IP, resulting in a complex architecture with additional layers stacked on top of the standard seven layers. This complexity makes it prone to vulnerabilities and security issues, resembling a skyscraper with multiple layers of potential weaknesses.

Ron Brash Indeed, within the realm of network management systems (NMS), various dialects and versions exist, each with its own nuances and potential dangers. Just as speaking English doesn't guarantee mutual understanding due to different accents or interpretations, different iterations of NMS can lead to misunderstandings or vulnerabilities.

These variations in NMS can manifest at different layers of the protocol stack, introducing complexities that may not be immediately apparent. Engaging in fuzzing or destructive testing, where deliberate attempts are made to disrupt or manipulate system functions, can reveal instances where the protocol's inherent limitations or ambiguities cause unexpected behavior. In some cases, these issues may stem not only from the interaction between the protocol and the devices it interfaces with but also from inherent weaknesses within the protocol itself.

Andreas Klien I once worked on a team that developed a new 650 client stack for a DMS (Distribution Management System) protocol. We built it from scratch and released it as an update to a product with a large user base. Despite extensive testing with various devices from different brands worldwide, we encountered unexpected issues upon release.

Many users reported that their critical protection relays crashed when our client stack connected to them, around 2014. Our client stack behaved according to standards, but the devices were not programmed with our client as a reference, leading to unexpected behavior. For example, bundling requests together for efficiency was not anticipated by the developers of older relays.

Since devices in the field couldn't be patched, we had to modify our client to adapt and avoid crashing them. However, this was challenging due to numerous bugs and vulnerabilities in existing implementations. We introduced soft modes, like an over-cautious mode, to mitigate risks and prevent crashes. Additionally, for certain devices, we implemented a "keep close" mode to maintain stability.

Even today, in our products, we activate these modes when we detect similar behaviors to ensure stability and prevent disruptions.

Ron Brash That's true! Many legacy architectures, especially those designed for serial communication, were not prepared for the complexities of modern network environments. For example, old flow meters and monitoring devices, like Honeywell Mercury meters, were designed for physical access only, lacking consideration for the "cocktail party problem" of multiple speakers.

With the advent of protocols like 61850, which are essentially shimmed onto various other protocols, such as serial and TCP, the challenge intensifies. These protocols were not initially designed to handle a multitude of network devices communicating simultaneously. Consequently, old devices may struggle when faced with rapid and concurrent communication from multiple clients.

State machines in these devices were not designed for such fast-paced interactions, leading to unexpected behavior when multiple clients attempt to communicate simultaneously. Edge cases, like handling more than ten clients at once or pipelining packets back-to-back, were often overlooked during testing.

In essence, many of these legacy architectures were not engineered with robust testing in mind, leading to vulnerabilities and unexpected behavior in modern network environments. This highlights the importance of thorough testing and consideration of edge cases in the development of new architectures and protocols.

Andreas Klien Yeah. Especially when they always have the one kind of OT system in mind. And over the years the product is used in different applications. And then suddenly other user scenarios are possible. The products you're mentioning before was this box in the serial implementation, which are these, for example? 

Ron Brash There are a couple of notable devices, like the Honeywell Mercury devices, which are now part of the full Honeywell brand. While I don't know the exact lead model name, many of the earlier ones were based on Windows CE. Additionally, there are other serial devices like flow meters from the Emerson catalog, known as the Rosemont series. These serial protocols often get pipelined over TCP gateways, which may also utilize OPC classic and other protocols like Dicom and SMB. These are common occurrences, but many devices, especially in the DCS workspace, have telnet connections that need to be handled carefully. For instance, one device I worked on, the Bachman and PC 40, was taken from the gray market of a freighter used to control water ballast. If telnet connections weren't managed properly, it would crash the telnet server and other peripherals, requiring a device reboot. So, even non-malicious activities can disrupt your devices if proper testing isn't done.

Andreas Klien It happens all the time. And I mean in many substations with older devices, but still 61850 protocol based. A denial of service attack would consist of just 10 SIM packets. So, you open ten connections and then every other kind is locked out. You cannot open another connection. So, it's a denial of service with ten packets. 

Ron Brash That points to an interesting thing, for people that are looking to try and get ideas of risk prevention. One of the most common and best places, I think to look for that type of information isn't CVEs. It's actually the hardware erratic notices that come from your vendor. So, and some vendors, they're constantly doing change logs and what they're fixing in the product. And you'll notice there's often a lot of wording in there that points to a cybersecurity issue but is not claimed as a cybersecurity issue. So, for example, a modbus packet with this range with this code causes the CPU to go into a hole. That's a big issue because they just told hackers how to write their exploit code. Thank you. 

Andreas Klien Exactly, yeah, yeah. 

Ron Brash But this doesn't come out as a CDE. 

Andreas Klien Because nobody reported it as a vulnerability. It was just reported as a bug. 

Ron Brash Yeah. So, I often look into those things or the hardware notices, especially if I know the CPU. There's tons of hardware around. If you think Specter is bad. It's nothing. There's all of these hardware notices if you want to stay up late at night. I like to look for some of the unsafe architecture because you'll see it very fast.

"Knowing the difference between 'Http' and 'Https' encryption may not matter much, but understanding that certain vulnerabilities could cause a device to crash is crucial and needs immediate attention."

Andreas Klien The demand for robust cybersecurity practices spans across industries, with buffer overflow vulnerabilities presenting a significant concern. Across various sectors, examining errata pages and investigating system crashes often reveals instances where buffer overflow exploits are implicated. These vulnerabilities, resulting from insufficient input validation or boundary checking, underpin many system failures and security breaches. Therefore, addressing buffer overflow possibilities is paramount in mitigating potential cybersecurity risks and ensuring the integrity and stability of systems across diverse industries.

Ron Brash I like to use words like 'errata' and engineering-focused terms because most of our operators and facility personnel have backgrounds in mechanical and electrical engineering. So, when someone mentions technical terms like 'buffer overflow', I translate it to 'device stops working'. Similarly, 'watchdog' means 'device will reboot problem'. I don't need these issues in my process, so I prioritize fixing them. It's about adjusting your vocabulary to focus on security, safety, reliability, and productivity. You don't need to be a cybersecurity expert; you just need to understand the potential impact on your devices. For example, knowing the difference between 'Http' and 'Https' encryption may not matter much, but understanding that certain vulnerabilities could cause a device to crash is crucial and needs immediate attention.

Andreas Klien Indeed, the most common reason for patching in the power grid is due to bugs and crashes. Addressing security vulnerabilities through patching is approached cautiously, as it introduces operational risks. However, if there's already an operational issue, updating the firmware becomes a more immediate consideration.

This brings me to my final question: Do you have any recommendations for cross-industry best practices? Perhaps we can summarize our findings here and identify potential benefits of adopting practices from the aviation industry to enhance security in the power grid.

Ron Brash There's been a lot of discussion around the Purdue model and its limitations regarding security, alongside the beneficial ISA 62443 models. Aviation industry documentation for ground systems closely resembles recommendations for Industrial Control Systems (ICS) and Industrial Automation and Control (IAC), but aviation stands out in certain aspects.

One notable difference is the reliance on GPS for time and data, which may not be ideal due to its single-channel, single-source nature. Aviation emphasizes redundancy with multiple channels for timing and communication, particularly important in the face of events like forest fires that could disrupt the power grid.

Another valuable lesson from aviation is the strict policy against transient laptops, particularly for technicians and third parties. Airports, often staffed by contractors, prioritize knowing exactly what assets are present and ensuring they are clean and secure.

Aviation also sets a precedent by not viewing aircraft as disposable assets but rather planning for their lifecycle. Critical infrastructure should adopt a similar mindset, recognizing that assets degrade over time and planning for transitions to maintain resilience.

Additionally, aviation's approach to predictive maintenance and control differs from traditional methods, prioritizing availability of spares and proactive measures. These non-typical recommendations encourage thinking differently to enhance resilience and preparedness in critical infrastructure sectors.

Andreas Klien Thank you Ron. So, I think we've done a good job in collecting some scary stories about bugs and vulnerabilities and the tech scenarios for aviation, but also the power industry. Is there anything you would like to add in the end? 

Ron Brash It's truly been a pleasure to be here and reconnect with you, Andreas. I'm grateful I didn't have to endure an 11-hour plane ride to join you. If anyone happens to cross paths with me at an event or knows I'm in town, please don't hesitate to reach out. I'd be delighted to share a beverage and catch up.

Andreas Klien Well done. Thank you. In this case, I will hand it back over to Scott. 

Scott Williams Thank you, Andreas and Ron, for this informative discussion. And a big thanks to our audience for listening to this episode of Energy Talks. We always welcome your questions and feedback. Simply send us an email to podcast@omicronenergy.com. OMICRON has several years of experience in power system testing, data management, and cybersecurity in the power industry and offers you the matching solution for your application. For more information, be sure to visit our website at omicronenergy.com. Please join us to listen to the next episode of Energy Talks. Goodbye for now, everyone! 

Listen to our podcast