The Dangers of Corrective Auto Remediation in Your Public Cloud

author_profile
Becca Gomby
Friday, Feb 10th, 2023

We’ve been hearing more often than ever before, questions about Security Orchestration & Automated Response (SOAR) or simply “auto remediation,” which is the ability to automatically execute actions in response to detected security incidents or vulnerabilities. While it has the potential to save time and improve the efficiency of security operations, it can also be risky and cause more headaches for your DevOps teams if not implemented carefully. 

How auto remediation works 

The type of auto remediation most involved in cloud security & cloud DevOps are familiar with is the type that automatically fixes posture or configuration errors. The goal of auto remediation is to improve the efficiency of security operations and minimize the impact of security incidents by quickly addressing them without the need for manual intervention. 

Here's how it works: 

  • Detection & Response: The first step in auto remediation is the detection of a security incident or vulnerability. This is typically accomplished by using security tools such as firewalls, intrusion detection systems, and vulnerability scanners. Once a security incident or vulnerability has been detected, the next step is to assess the situation and determine the best course of action. This is typically accomplished using automated algorithms that consider various factors, such as the severity of the incident, the impact it could have, and the likelihood of successful remediation. 
  • Remediation: If the assessment determines that remediation is necessary, the next step is to automatically implement the corrective actions. This may involve patching software, reconfiguring systems, or taking other appropriate measures to address the security incident or vulnerability. 
  • Verification (i.e. Detect Again): The final step in auto remediation is to verify that the remediation was successful and that the security incident or vulnerability has been resolved. This may involve automated checks or manual verification by security personnel. 

Overall, auto remediation is a powerful tool for improving the efficiency and effectiveness of cloud security operations. However, it's important to approach it with caution and to thoroughly test and validate the tools and scripts before deploying them in a production environment.  

What are the threats associated with applying auto remediations? 

While the concept of automating remediation in the cloud is appealing to improve efficiency or MTTR (mean time to resolve), it is essential to be aware of the issues associated with such applications. 

  • Availability risks: Auto-remediation scripts may change a configuration or setting on an application component leading to unforeseen downtime or degradation due to the inability for legitimate customers to connect. 
  • Lack of transparency: With automatic remediation, it can be difficult to understand why certain actions were taken and what the impact was. This lack of transparency can make it difficult to diagnose and resolve problems. 
  • Lack of human oversight: Automated remediation relies on pre-determined rules and algorithms and does not consider the unique context of each security incident. Human security experts are often better equipped to make informed decisions about the best course of action. 
  • Dependence on technology: Auto remediation depends on technology to function, which can be unreliable or subject to bugs or errors. In the event of a technical malfunction, the remediation system could cause unintended consequences. 

How auto remediation creates cloud drift and why your DevOps team won’t like it  

In cloud security, Infrastructure as Code (IaC) drift refers to the difference between the desired state of the infrastructure as defined in code and the actual state of the infrastructure as it exists in the environment. Auto remediation in the presence of IaC drift can be risky because it can lead to unintended consequences and cause further drift in the infrastructure.  

Here are some of the risks associated with auto remediation in relation to IaC drift: 

  • Overcorrection: If the remediation script or tool is not aware of the desired state of the infrastructure as defined in the IaC, it may take actions that are inconsistent with the desired state, causing further drift and potentially even more security incidents.
     
  • Inconsistent remediation: In some cases, the remediation script or tool may not fully understand the context in which it is being executed, leading to inconsistent remediation across similar incidents. This can lead to confusion and unpredictable behavior in the environment.
     
  • Loss of control: If the remediation script or tool is not well-designed or thoroughly tested, it can cause unexpected changes to the environment, leading to a loss of control over the infrastructure.
  • Conflicts with other processes: In some cases, auto remediation may interfere with other processes or systems that are also operating in the environment, leading to unexpected behavior and potentially even more security incidents. 
Shay Ulmer, Software Engineer, Panoptica

It's important to approach auto remediation with caution and to thoroughly test and validate the tools and scripts before deploying them in a production environment. For your DevOps teams, this will likely mean thoroughly testing the remediation script or tool, incorporating the desired state of the infrastructure into the remediation process, and closely monitoring the environment for unexpected changes.  

Why Panoptica doesn’t use auto remediation but provides Dynamic Remediation 

One of Panoptica’s core differentiators in the market has been its ability to provide Dynamic Remediation for DevOps and Security teams. Panoptica’s Security Orchestration, Automation and Response (SOAR) workloads are centered around a DevOps-centric / pipeline-first approach where we generate templates to apply corrective actions in formats that DevOps engineers are likely to use with the intent to strongly suggest they be applied in a CI system instead of just blindly executed. 

Panoptica generates these dedicated guardrails per account based on its configurations. When a critical attack path is identified or discovered, the Panoptica platform offers dedicated guardrails via Infrastructure as Code (IaC) Terraform files that users can download and apply to their environments. Panoptica likewise offers the ability to customize these guardrails so that if there are specific identities or accounts that require access policies, they can be modified so it is not a blanket guardrail policy applied to all. 

Panoptica’s Dynamic Remediation provides: 

  • Flexibility: Dynamic Remediation allows for more flexibility in the remediation process, as it can be customized and adjusted in real-time based on the specific needs and circumstances of each situation. 
  • Control: Dynamic Remediation gives security teams more control over the remediation process, allowing them to make informed decisions and take appropriate actions. 
  • Complexity: Some security incidents may be too complex to be fully automated and require human intervention to properly resolve. Dynamic Remediation allows for the appropriate level of human involvement to ensure effective remediation. 
  • Accuracy: Dynamic Remediation can result in more accurate remediation as it allows security teams to assess the situation and make decisions based on the specific details of each incident. 
Jonathan Rau, Panoptica

Auto remediation in theory is a nice shortcut but in practice, needs to be further scrutinized. Such solutions should only be considered if applied under the careful eye of expert cloud security practitioners to ensure that no undue damage is done to your cloud environment. Auto remediation could improve your MTTR, but at what cost? 

Instead, why not get started with a platform that can reduce the noise in your cloud environment on average by 95% and provide your team with out of the box Dynamic Remediation that can be customized to best serve your environment’s unique requirements?  

Popup Image