Security + Data Science: Audit log Normalization Part 1

In this posting I am interested in explaining more about the audit log normalizer. It represents a fundamental shift in what information that is available from the audit logs. If you take a look at a typical event as it comes from the kernel, you will see things in many places. For example, consider this event:

type=PROCTITLE msg=audit(1487184592.084:894): proctitle=7573657264656C007374657665
type=PATH msg=audit(1487184592.084:894): item=4 name="/etc/gshadow" inode=17040896 dev=08:32 mode=0100000 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:shadow_t:s0 nametype=CREATE
type=PATH msg=audit(1487184592.084:894): item=3 name="/etc/gshadow" inode=17040557 dev=08:32 mode=0100000 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:shadow_t:s0 nametype=DELETE
type=PATH msg=audit(1487184592.084:894): item=2 name="/etc/gshadow+" inode=17040896 dev=08:32 mode=0100000 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:shadow_t:s0 nametype=DELETE
type=PATH msg=audit(1487184592.084:894): item=1 name="/etc/" inode=17039361 dev=08:32 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 nametype=PARENT
type=PATH msg=audit(1487184592.084:894): item=0 name="/etc/" inode=17039361 dev=08:32 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 nametype=PARENT
type=CWD msg=audit(1487184592.084:894): cwd="/root"
type=SYSCALL msg=audit(1487184592.084:894): arch=c000003e syscall=82 success=yes exit=0 a0=7fff7a36b340 a1=562ee0c337c0 a2=7fff7a36b2b0 a3=20 items=5 ppid=2350 pid=6755 auid=4325 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4 comm="userdel" exe="/usr/sbin/userdel" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="identity"

How do you know what it means?

The Requirements
All audit events have a basic requirement stated in common criteria and re-stated in other security guides such as PCI-DSS. The events have to capture:

* Who did something (the subject)
* What did they do
* To what was it done (the object)
* What was the result
* When did it occur

The Implementation
The way that the Linux kernel operates is that there are hardwired events and configurable events. The configurable events are based on a syscall filtering. You write a rule that says whenever this action occurs, let me know about it and label it with a key to remind what its about. For example, this rule triggered the above event:

-w /etc/gshadow -p wa -k identity

This rule says, notify me whenever the file /etc/gshadow is written to. There are a list of syscalls that are considered to have written to a file. We can see the list here:

https://github.com/linux-audit/audit-kernel/blob/master/include/asm-generic/audit_dir_write.h

and

https://github.com/linux-audit/audit-kernel/blob/master/include/asm-generic/audit_write.h

In our case the rename syscall, which is mentioned on line 2 of audit_dir_write.h, is the one that triggered the event. When this happens, the kernel grabs a bunch of information about the syscall and places it in the event as a syscall record. This occurs in the syscall filter where everything about the process and syscall is readily available. The act of doing the rename occurs in several places throughout the kernel. In those places it might be too invasive or have incomplete information available that prevents writing a good event. What the kernel does is add information about what is being acted upon as auxiliary records. These are things like PATH, CWD, SOCKADDR, IPC_OBJ, etc.

Translating the output
What this means is that identifying a subject is pretty straight forward, but identifying the object takes a lot of specialized knowledge.

To understand the pieces of the event, you would need to look in the SYSCALL record at the auid and uid fields to find the subject. What they are doing is determined by the actual syscall. In this case, rename. Now, which of the 4 path records is the one we want? In this case we want to know the existing file that disappeared. Its the one that got renamed or acted upon. This would be the PATH record with the nametype=DELETE. Then the "name" field in the same record tells us that its /etc/gshadow+. Easy, right? :-)

You might say there has to be an easier way. Especially given that this is just one sysycall and there are over 300 sycalls which are very different from one another.

The audit normalizer
The easier way has just been created with the audit-2.7 release and its the auparse_normalizer API.It builds on top of the auparse library functionality. The auparse library has a concept of an internal cursor that walks across events, then records within an event, and then individual fields within a record. So, if you are analyzing a log, you would be iterating through events using the auparse_next_event() function call.

To get a normalized view of the current event, a programmer simply calls auparse_normalize(). This function looks up all the information about subject, object, action and assembles it into an opaque data structure which is associated with the event.

To get at the information, there are some accessor functions such as, auparse_normalize_subject_primary(). This function places the internal cursor on top of the field that was determined to be the subject. Then you can use the normal field accessor functions such as, auparse_interpret_field(), which in our event, takes the auid and converts it to the account name.

There are other accessor functions in the auparse_normalize API. The ones that return an int are ones that position the internal cursor. The ones that return a char * point to internal metadata that does not exist in the actual record. These can be used directly.

Parlor Tricks
While working on the normalizer, I realized that it might be possible to turn logs into English sentences. I was thinking that we could create the sentence in the following form:

On 1-node at 2-time 3-subj 4-acting-as 5-results 6-action 7-what 8-using

Which maps to:

1) node
2) time
3) auid, failed logins=remote system
4) uid (only when uid != auid)
5) res - successfully / unsuccessfully
6) op, syscall, type, key
7) path, address, account, policy, rule
8) exe,comm

This layout forms the basis for the output from "ausearch --format text". Using this commandline option, our event turns into this:

At 13:49:52 02/15/2017 sgrubb, acting as root, successfully renamed /etc/gshadow+ using /usr/sbin/userdel

This works out pretty well in most cases. I am still looking over events and making sure that they really do state the truth about an event. I haven't made any significant changes in a couple weeks, so I think we are close to final.

Next post we will dive in deeper with the normalizer and discuss how the CSV output is created. It fully uses the auparse_normalizer API to give you everything it knows about an event.

Security + Data Science

Wednesday, February 22, 2017

Audit log Normalization Part 1

No comments:

Blog Archive