Thursday, May 25, 2017

Event overflow during boot

Today I wanted to explain something that I think needs to be corrected in the RHEL 7 DISA STIG. The DISA STIG is a Technical Guide that describes how to securely configure a system. I was looking through its recommendations and saw something in the audit section that ought to be fixed.

BOOT
When the Linux kernel is booted, there are some default values. One of those is the setting for the backlog limit. The backlog limit describes the size of the internal queue that holds events that are destined for the audit daemon. This queue is a holding area so that if the audit daemon is busy and can't get the event right away, they do not get lost. If this queue ever fills up all the way, then we have to make a decision about what to do. The options are ignore it, syslog that we dropped one, or panic the system. This is controlled by the '-f' option to auditctl.

Have you ever thought about the events in the system that are created before the audit daemon runs? Well, it turns out that when a boot is done with audit=1, then the queue is held until the audit daemon connects. After that happens, the audit daemon drains the queue and it functions normally. If the system does not boot with audit=1, then the events are sent to syslog immediately and are not held.

The backlog limit has a default setting of 64. This means that during boot if audit=1, then it will hold 64 records. Let's take a look at how this plays out in real life.

$ ausearch --start boot --just-one
----
time->Wed May 24 06:55:20 2017
node=x2 type=DAEMON_START msg=audit(1495623320.378:4553): op=start ver=2.7.7 format=enriched kernel=4.10.15-200.fc25.x86_64 auid=4294967295 pid=863 uid=0 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=success

This is the event where the audit daemon started logging. In all likelihood the backlog limit got set during the same second. So, let's gather up a log like this:

ausearch --start boot --end 06:55:20 --format csv > ~/R/audit-data/audit.csv

The STIG calls for the backlog limit to be set to 8192. Assuming that we booted with the STIG suggested value, we can take a quick peek inside the csv file to see if 8192 is in the file. It is in my logs. If its not in yours, then increment the --end second by one and re-run. This assumes that you also have '-b 8192' in your audit rules.

What we want to do is create a stacked area graph that plots the cumulative number of events as one plot. This one shows events that are piling up in the backlog queue. We'll color this one red. Then we want to overlay that area graph with one that shows the size of the backlog queue. Its value will be 64 until the 8192 value comes along and then the size is expanded to 8192. We'll color this one blue.

The following R code creates our graph:

library(ggplot2)

# Read in the logs
audit <- read.csv("~/R/audit-data/audit.csv", header=TRUE)

# Create a running total of events
audit$total <- as.numeric(rownames(audit))

# create a column showing backlog size
# Default value is 64 until auditctl resizes it
# We choose a fill of 500 so the Y axiz doesn't make leakage too small
audit$backlog <- rep(64,nrow(audit))
audit$backlog[which(audit$OBJ_PRIME == 8192):nrow(audit)] = 500

# Now create a stacked area graph showing the leakage
plot1 = ggplot(data=audit) +
  geom_area(fill = "red", aes(x=total, y=total, group=1)) +
  geom_area(fill = "blue", aes(x=total, y=backlog, group=2)) +
  labs(x="Event Number", y="Total")

print(plot1)


What this graph will tell us is if we are losing events. If we are not losing events, then the blue area will completely cover the red area. If we are losing events, then we will see some red. Everybody's system is different. You may have SE Linux AVCs or other things that happen which is different than mine. But for my system, I get the following chart:




Looking at it, we do see red area. Guestimating, it appears to drop about 50 events or so. This is clearly a problem if you wanted to have all of your events. So, what do you do about this?

There is another audit related kernel command line variable that you can set that initializes the backlog limit to something other than 64. If you wanted to match the number that the DISA STIG recommends, then on a grub2 system (such as rhel 7), you need to do the following as root:

  1. vi /etc/default/grub
  2. find "GRUB_CMDLINE_LINUX=" and add somewhere within the quotes and probably right after audit=1 audit_backlog_limit=8192. Save and exit.
  3. rebuild grub2 menu with: grub2-mkconfig -o /boot/grub2/grub.cfg. If UEFI system, then: grub2-mkconfig -o /boot/efi/EFI/<os>/grub.cfg  (Note: <os> should be replaced with redhat, centos, fedora as appropriate.)

That should do it.


Conclusion
The DISA STIG is a valuable resource in figuring out how to securely configure your Linux system. But there are places where it could be better. This blog shows that you can lose events if you don't take additional configuration steps to prevent this from happening. If you see "kauditd hold queue overflow" or something like that on your boot screen or in syslog, then be sure to set audit_backlog_limit on the kernel boot prompt.

No comments: