OK, we are about done with all the prerequisites before we jump into doing the more interesting analytics. I expect that there will be some people attracted to this series of articles that have a limited understanding of the Linux audit system and how to set it up. But, I would like everyone to be able to recreate the analysis that we are going to do. So, we need go over some basics first. If I show you neat diagrams and show you how I created them, its nice. But if it also works on your system with your data, then its much more valuable.
The first thing we need to discuss is all the pieces that are in play or may come up in future discussions. Having a common set of terms will let the experienced and the newbie both understand what is being presented. Below is a diagram of the various pieces of the audit system and what they do.
First let me explain the colors. Light blue are the things that create the events, purple is the reporting tools, red is the controller, gray is the logs, and green is the real-time components.
Audit events can be created in two ways. There are applications that send events any time something specific happens. For example, if you log in to sshd, it will send a series of events as the log in proceeds. It is considered a trusted application and it always tries to send events. If the audit system is not enabled, the event is discarded. Otherwise the kernel accepts the event, time stamps it, adds sender information to the event, and queues it for delivery to the audit daemon, auditd. The only job that the audit daemon has is to reliably dequeue and write events to the log and the event dispatcher, audispd.
The other way that events are created is by the kernel observing system activity that matches a rule loaded by auditctl. The kernel is the thing that creates most events...assuming you loaded rules. It uses a first matching rule system to decide if a syscall is of any interest.
During each system call made by a program, the kernel will look at the rules and compare attributes of the calling program with a rule. These attributes could be the loginuid, uid, executable name, login session, or many other things. You can look at the auditctl man page for a comprehensive list.
When a rule matches, the event is created which records a lot of information about the process that triggered the event. When that happens, the event is time stamped and queued to the audit daemon for disposition.
The audit system is configured by a program named auditctl. It can, among other things, enable/disable the audit system, load rules, list rules, and report status.
The audit daemon simply polls the kernel for events. When it sees an event is available, it looks at its configuration to determine what to do with the event. It may or may not write the event to disk. It may or may not send the event to its real-time interface. Normally, it sends events to both.
Once an event is written to disk, the reporting tools ausearch, aureport, and aulast can be used. Ausearch is specialized at picking events out of the audit trail matched by command line options. It is the recommended tool for looking at audit events because it can detangle interlaced events, group the records into a complete event, and interpret the event several ways. Aureport is a program that can summarize kinds of events. The aulast program is specialized at showing login sessions.
If the audit daemon has been configured to send events to the real-time interface (the dispatcher) it queues the event to audispd who's job it is to distribute the event to any process wanting to see the events in real-time. These could be alerting programs or remote logging programs.
There are only a couple settings I want to discuss for the purposes of getting events ready to do analysis. There are other important settings which I will save for another blog post. I will simply list the settings from /etc/audit/auditd.conf that are important for analysis and data science:
The first one determines if the audit daemon should record local events. The answer should be "yes" unless the daemon is running in a container that has no access to the audit netlink interface. In that case it can still be used for aggregating logs from other systems by setting this value to "no".
The write_logs setting should be set to "yes" to put events to disk. This is normal although some people prefer not writing logs and instead, immediately send all events to a remote logging system.
Normally, to see audit logs you must be root. This is not always desirable and it interferes with an easy work flow. The log_group configuration item should be set to a group that you have access to. It could be your own group name, wheel, or maybe you make a special purpose audit group to give people access to logs without the need of privileges.
The log_format should be set to "enriched" so that extra information is recorded about the event at the time its logged so that the event can be analyzed on other systems.
The flush technique should be set to INCREMENTAL_ASYNC which gives the highest performance with a reasonable guarantee the event made it to disk. The freq setting tells how often to flush written events to disk. Set this to 50 for normal systems or 100 if you have a busy system.
To make sure we get important system events, we need to configure some rules to get things that are important. If you are on a Fedora system, the first thing to do is delete /etc/audit/rules.d/audit.rules. It has a rule in it that basically turns off the audit system. This was intentionally mandated by Fedora's governing board, FESCO. Anyways, undo the damage. Next lets copy a couple rules to the right place:
I'd recommend one change to the stig rules. Around line 95 is 6 rules to record successful and unsuccessful uses of chmod. This creates way too many events. Either delete them or comment them out by placing a '#' at the beginning of the rule.
We also want to define the locations of a couple standard directories in your home directory for use in our experiments.
RStudio will create an 'R' directory for you in your home directory when you install libraries. This is where we want to keep all of our R scripts, data, and libraries. Go ahead and create this directory structure with the same names if it does not exist.
Aliases That Help
A few aliases in the .bashrc file can make for less typing later. I'd recommend:
Log out and log back in. Try 'autext --start today'.
This concludes the last of the prerequisite and setup blogs. Starting next week, we will begin talking about how to use R to do some pretty interesting things with the audit logs. In the mean time it might be good to brush up on R programming using one of the tutorials I mentioned in my previous posting.