Containers
Measure cluster performance impact of HAQM GuardDuty EKS Agent
Introduction
HAQM GuardDuty is a threat detection service that continuously monitors your AWS environment for malicious activity and anomalous behavior. Since its launch in 2017, HAQM GuardDuty has expanded its visibility and threat detection coverage. HAQM GuardDuty is capable of analyzing tens of billions of events per minute across multiple AWS data sources such as AWS CloudTrail event logs, HAQM Virtual Private Cloud (HAQM VPC) Flow Logs, DNS query logs, HAQM Simple Storage Service (HAQM S3) data plane events, HAQM Relational Database Service (HAQM RDS) login events, HAQM Elastic Kubernetes Service (HAQM EKS) audit logs. In addition, as of March 30, 2023, HAQM GuardDuty also analyzes HAQM EKS runtime events.
With the release of HAQM GuardDuty EKS Runtime Monitoring, over 30 new security findings can be generated based on HAQM EKS event data originating from processes inside containers and hosts. These new findings are made possible by an eBPF agent that inspects activities occurring inside the container runtime environment such as process execution, file access, and network connections. HAQM GuardDuty EKS Runtime Monitoring can be enabled for your whole organization with just a few clicks.
How HAQM EKS Runtime Monitoring works
HAQM EKS Runtime Monitoring captures runtime activity from your HAQM EKS workloads through an agent installed on your nodes (or HAQM EC2 instances). The HAQM EKS Runtime Monitoring agent was built using eBPF, a Linux technology that allows you to extend the capabilities of the Linux kernel by loading and running custom programs that run in the kernel in a safe, sandboxed, environment. eBPF was chosen for HAQM GuardDuty EKS Runtime Monitoring due to its simplicity, safety, portability, and the detailed telemetry it can get from the kernel. The HAQM GuardDuty agent is packaged as an HAQM EKS add-on, which makes it easy to deploy and manage. While HAQM GuardDuty supports automated deployment and updates of the add-on across all clusters (i.e., within an AWS organization), it can also be managed manually, allowing you to fine-tune the clusters you’d like protected.
The HAQM EKS Runtime Monitoring agent is deployed as a DaemonSet. The DaemonSet instantiates an instance of the agent on every matching node in an HAQM EKS cluster. The agent loads an eBPF probe directly into the kernel in a sandboxed-like environment. Once installed, the agent starts capturing data from the underlying kernel, including host level events and container processes. Data from the kernel is then enriched with additional metadata gathered from userspace such as the Kubernetes Pod name, the namespace the pod is running in, and the cluster name.
From there, the event data is forwarded to HAQM GuardDuty’s backend through a managed VPC endpoint. To communicate with HAQM GuardDuty, the container agent uses the HAQM EC2 instance identity role for temporary credentials in order to securely send the telemetry data to the HAQM GuardDuty endpoint. Finally, HAQM GuardDuty ingests the events from the agent, analyzes them for threat activity, and generates findings as needed.
Monitoring cluster performance impact
HAQM GuardDuty EKS Runtime Monitoring, like all features of HAQM GuardDuty, was designed to have a negligible impact in your cluster and its workloads’ performance. The agent has upper limits of 1000m and 1 GB for CPU and memory, respectively. Accessing runtime event data requires some presence on the node, but the only observable activity is the eBPF agent collecting data and forwarding it to HAQM GuardDuty for analysis. Customers that would like to observe the impact of the agent on their cluster’s compute resources can explore using Inspektor Gadget and the top command as discussed in the following sections.
Inspektor Gadget is an eBPF-based toolset for debugging and inspecting Kubernetes resources and applications. Inspektor Gadget is well integrated with Kubernetes and spins up Pods that inject eBPF programs into the kernel. These programs then extract and display information about the activities occurring within Pods that are running on that node.
The following screenshot shows the usage and performance of eBPF programs running in an HAQM EKS cluster with HAQM GuardDuty Runtime Monitoring enabled and active. The HAQM EKS cluster runs an application that generated a number of HAQM EKS Runtime Monitoring findings.
While this application and the agent were actively running on the HAQM EKS cluster, a 4-second trace was performed using the top ebpf gadget. This trace noted that the HAQM EKS Runtime Monitoring agent was called 525 times and ran for just over 1 millisecond during that 4-second window. Note that workloads are unique, so results vary depending on the nature of your workload and the runtime events it causes. You can find guidance for using Inspektor Gadget in your environment here.
Customers can also measure the CPU and memory usage of the agent by executing the kubectl top command in their HAQM EKS nodes. This command shows resource usage as a percent of total CPU and Memory on the node. The following screenshot was taken on a node within the same cluster discussed earlier after executing the top command.
Interpreting HAQM EKS Runtime Monitoring findings
HAQM EKS Runtime Monitoring analyzes runtime events in your protected clusters to generate security findings. These findings can be viewed within the HAQM GuardDuty console from the findings tab. Note how the Resource Type criteria was set to EKSCluster, which displays findings related to your HAQM EKS clusters.
Each finding contains the relevant data for addressing the potential threat, which can be viewed within the console by selecting the finding you are interested in. From there, you can quickly see an overview of the finding, including its severity, the impacted HAQM EKS cluster, and when it was last detected. To learn more about the finding and how to remediate it, you can select Info next to the finding summary. To address findings automatically you can use HAQM EventBridge or the Custom Action feature of AWS Security Hub. A full list of the findings generated by HAQM EKS Runtime Monitoring is available on this page in the HAQM GuardDuty User Guide.
HAQM EKS Runtime Monitoring provides detailed threat information with minimal administration and cluster performance impact
HAQM EKS Runtime Monitoring allows security and platform teams to get runtime event specific details related to their workloads. Impact to workloads is minimal and can be observed using Inspektor Gadget, as discussed previously. To get started with HAQM EKS Runtime Monitoring, it takes just a few choices within the HAQM GuardDuty console.
There is a 30-day free trial for new HAQM GuardDuty users. If you are enabling HAQM GuardDuty for the first time, then HAQM EKS Runtime Monitoring won’t be enabled by default, and needs to be enabled as described here. If you are an existing HAQM GuardDuty user, then you can still use HAQM EKS Runtime Monitoring for 30 days at no additional charge. During your trial period, you can see the estimated cost of HAQM EKS Runtime Monitoring on the usage tab of the HAQM GuardDuty console. When evaluating the cost of HAQM EKS Runtime Monitoring, note that clusters covered by this protection won’t incur HAQM GuardDuty VPC Flow Log Analysis charges, which may result in material overall cost savings. This is because HAQM GuardDuty findings that previously relied on VPC Flow logs for detection can use the HAQM GuardDuty agent instead. To learn more, see the HAQM GuardDuty pricing page.
For more information, see the HAQM GuardDuty User Guide or reach out to your usual AWS support contacts.