Security + Data Science

As we discussed in the last blog, there is a new API in the auparse library that gives a unified view of the audit trail. In this post we will dig deeper into it to understand the details of what it provides.

Ausearch CSV format

Ausearch is the main tool for viewing audit logs. Two of the output formats use the normalizer interface to output the event, text and CSV. The text format was covered in the last blog post but this time we'll just look at the CSV output.

Ausearch has a lot of options that allows it to pick out the events that you are interested in. But with the CSV mode, we don't have to be quite as picky. We can just grab it all and use big data techniques to subset the information in a variety of ways.

The following discussion applies to the audit-2.7.3 package. Previous and future formats may differ slightly in details and quantity of information. Let's get an event to look at:

$ ausearch --start today --just-one --format csv 2>/dev/null
NODE,EVENT,DATE,TIME,SERIAL_NUM,EVENT_KIND,SESSION,SUBJ_PRIME,SUBJ_SEC,ACTION,
RESULT,OBJ_PRIME,OBJ_SEC,OBJ_KIND,HOW,DAEMON_START,02/25/2017,09:17:38,8866,
audit-daemon,unset,system,root,started-audit,success,,,service,

The first line is a header that specifies the name of each column. This is useful when you import the data into a spreadsheet or an R program. In R it becomes the name of the field within the dataframe variable holding the audit trail. This will be important in the next couple of blogs. But for now, let's take a look at each field:

NODE - This is the name of the computer that the event comes from. For it to have any value, the admin would have to set the name_format and possibly name options in auditd.conf. Normally this is missing unless you do remote logging. So, don't worry if its empty.

EVENT - This field corresponds to the type field in an event viewed with ausearch. If the event includes a syscall record, in almost all cases it will be called a SYSCALL event.

DATE - This is the date formatted as specified for the locale of your machine.

TIME - This is the time in minutes, seconds, and hours formatted as specified for the locale of your machine.

SERIAL_NUM - This is the serial number of the event. This is given to help locate the exact event if you needed to find it.

EVENT_KIND - This is metadata about the previously given EVENT field. This is useful in helping to subset and classify data. It currently has the following values:

unknown - This means the event can't be classified. You should never see this. If you do please report it on the linux-audit mail list or file a bug report.
user-space - This is a catch all for user space originating events that are not classified another way.
system-services - This is system and service events. This include system boot, shutdown, runlevel changes, service start and stop events.
configuration - This includes user space config changes such as setting the time. It also includes kernel changes such as loading netfilter rules and changes to the audit system.
TTY - This is kernel and user space TTY events.
user-account - This collects up all the events that relate to creating, modifying, and deleting users. It also includes events related to creating, modifying, or deleting groups.
user-login - This includes all the events related to authentication, authorization, assignment of credentials, login, establishing a session, ending a session, disposing of credentials, and logging out.
audit-daemon - These are events related specifically to the audit daemon itself.
mac-decision - This is a Mandatory Access Control policy decision. In terms of SELinux, it would be an AVC event.
anomaly - This is an anomalous event. This means the event is unusual and should be looked at carefully. This include promiscuous mode changes for a network interface, program crashes, or programs dereferencing suspicious symlinks. In the future it will include events created by an IDS component as it identifies suspicious behavior.
integrity - This is integrity events coming from the IMA subsystem.
anomaly-response - This is for all events recorded by an IPS system that is responding to anomaly events.
mac - This is for any event related to the configuration of a Mandatory Access Control System.
crypto - This is for user space and kernel cryptography events.
virt - This is for any events related to the management of virtualization or containers.
audit-rule - This is for events that are directly related to the triggering of an audit rule. Normally this is syscall events.

SESSION - This is the session number of the user's login. When users login, a unique session identifier is created and inherited by any program in that session. This is to allow tracking an action back to an exact login. Sometimes it can be "unset" which means its not related to any login. This would indicate its related to a daemon.

SUBJ_PRIME - This is the main way that the subject is identified. In most cases this is the interpreted value of the auid field. This was chosen over the numeric number associated with an account because you may have several accounts with the same name but different account numbers across a data center.

SUBJ_SEC - This is the second way of identifying, or an alias to, the subject. Typically this is the interpreted uid field in an audit event. Normally the only time its different from the prime/auid is when you su or sudo to a different account from your login.

ACTION - This is metadata about what the subject is doing to the object. This is helpful in subsetting or classifying the event. Determining the action can be based on the event type, what the sysycall is, the op field in some events, and if it can't determine the syscall, its simply "triggered-audit-rule". The list of actions is too big to list in this blog post.

RESULT - This is either success or failed.

OBJ_PRIME - This is the main way that an object is identified. It can be a file name, account, device, virtual machine, and more. Look at the OBJ_KIND description below for more ideas on what this could be.

OBJ_SEC - This is the secondary way to identify the object. For example, the path to a file is the main way to identify a file. The secondary identifier is the inode. In some cases this may be a terminal, vm, or remote port.

OBJ_KIND - This is metadata that lists what kind of object the event is about. This can be useful in subsetting or classifying the events to see what's happening to specific kinds of things. Current values are as follows (this is self explanatory):

unknown
fifo
character-device
directory
block-device
file
file-system
symlink
socket
process
firewall
service
account
user-session
virtual-machine
printer
system
admin-defined-rule
audit-config
mac-config
memory 

HOW - This is how the subject did something to the object. Typically this is the program used to do it. In some cases the program is an interpreter. In that case the normalizer lists the command being run.

Extra Fields

The ausearch program has a couple command line switches that will cause even more fields to be emitted.

--extra-time - This causes ausearch to dump the following extra fields: YEAR,MONTH,DAY,WEEKDAY,HOUR,GMT_OFFSET

--extra-labels - This causes ausearch to add: SUBJ_LABEL and OBJ_LABEL which comes from the Mandatory Access Control system.

--extra-keys - This causes ausearch to dump the KEY value which is only found in syscall events.

Malformed Events

If we create an audit.csv file as follows:

$ ausearch --start today --format csv > audit.csv 

And open it with libreoffice, you will notice that some rows do not have complete information. The reason is that not all events have the proper and required fields. There are one or two in user space, but the majority come from the kernel. Events that are known to be malformed are:

NETFILTER_CFG
CONFIG_CHANGE
MAC_STATUS
MAC_CONFIG_CHANGE
MAC_POLICY_LOAD

The fix for this is these events must be updated to include all the required fields. Until then, the analysis will have faults and not work correctly. We can use techniques to pull these events out of the analysis. But in the long run, they should be fixed.

This wraps up the overview of the fields that you will find in the CSV output of ausearch. We will next start into analytical programs since we have background material about what we will be looking at.

Security + Data Science

Sunday, February 26, 2017

Audit log Normalization Part 2 - CSV format

Wednesday, February 22, 2017

Audit log Normalization Part 1

Monday, February 20, 2017

Pivot Tables

Friday, February 17, 2017

Introduction to Linux Audit

Thursday, February 16, 2017

Building R Studio

Wednesday, February 15, 2017

Setting up a rpm build environment

Tuesday, February 14, 2017

Introduction

Blog Archive