Security + Data Science: Audit Record Fields Visualized

Before we move on from Dendrograms, I wanted to write about a post that combines what we have learned about the auparse library and R visualizations. Sometimes what you want to do requires writing a helper script that gets exactly the information you want. Then once we have prepared data, we can then take it through visualization. What I want to demonstrate in this post is how to create a graph of the audit record grammar.

Audit Records
The first step is to create a program based on auparse that will iterate over every event, and then over every record, and then over every field. We want to graph this by using a Dendrogram which means that things that are alike share common nodes and things that are different branch away. We want to label each record with its record type. Since we know that every record type is different, it would not be a good idea to start the line off with the record type. We will place it at the end just incase two records are identical except the record type.

Once our program is at the beginning of a record, we will iterate over every field and output its name. Since we know that we need a delimiter for the tree map, we can insert that at the time we create the output. Sound like a plan? Then go ahead and put this into fields-csv.c

#include <stdio.h>
#include <ctype.h>
#include <sys/stat.h>
#include <auparse.h>

static int is_pipe(int fd)
{
    struct stat st;

    if (fstat(fd, &st) == 0) {
        if (S_ISFIFO(st.st_mode))
            return 1;
    }
    return 0;
}

int main(int argc, char *argv[])
{
    auparse_state_t *au;

    if (is_pipe(0))
            au = auparse_init(AUSOURCE_DESCRIPTOR, 0);
    else if (argc == 2)
            au = auparse_init(AUSOURCE_FILE, argv[1]);
    else
            au = auparse_init(AUSOURCE_LOGS, NULL);
    if (au == NULL) {
            printf("Failed to access audit material\n");
            return 1;
    }

    auparse_first_record(au);
    do {
        do {
            int count = 0;
            char buf[32];
            const char *type = auparse_get_type_name(au);
            if (type == NULL) {
                snprintf(buf, sizeof(buf), "%d",
                            auparse_get_type(au));
                type = buf;
            }
            do {
                const char *name;

                count++;
                if (count == 1)
                    continue;
                name = auparse_get_field_name(au);
                if (name[0] == 'a' && isdigit(name[1]))
                    continue;
                if (count == 2)
                    printf("%s", name);
                else
                    printf(",%s", name);
            } while (auparse_next_field(au) > 0);
            printf(",%s\n", type);>
        } while (auparse_next_record(au) > 0);
    } while (auparse_next_event(au) > 0);

    auparse_destroy(au);

    return 0;
}

Then compile it like:

gcc -o fields-csv fields-csv.c -lauparse

Next, let's collect our audit data. We are absolutely going to have duplicate records. So, let's use the sort and uniq shell script tools to winnow out the duplicates.

ausearch --start this-year --raw | ./fields-csv | sort | uniq > ~/R/audit-data/year.csv

Let's go into RStudio and turn this into a chart. If you feel you understand the dendrogram programming from the last post, go ahead and dive into it. The program, when finished, should be 6 actual lines of code. This program will has one additional problem that we have to solve.

The issue is that we did not actually create a normalized csv file where every record has the same number of fields. What R will do is normalize it so that all rows have the same number of columns. We don't want that. So, what we will do is use the gsub() function to trim the trailing slashes off. Other than that, you'll find the code remarkably similar to the previous blog post.

library(data.tree)
library(networkD3)

# Load in the data
a <- read.csv("~/R/audit-data/year.csv", header=FALSE, stringsAsFactors = FALSE)

# Create a / separated string list which maps the fields
a$pathString <- do.call(paste, c("record", sep="/", a))

# Previous step normalized the path based on record with most fields.
# Need to remove trailing '/' to fix it.
gsub('/+$', '', a$pathString)

# Now convert to tree structure
l <- as.Node(a, pathDelimiter = "/")

# And now as a hierarchial list
b <- ToListExplicit(l, unname = TRUE)

# And visualize it
diagonalNetwork(List = b, fontSize = 12, linkColour = "black", height = 4500, width = 2200)

When I run the program, with my logs I get the following diagram. Its too big to paste into this blog, but just follow the link to see it full sized:

http://people.redhat.com/sgrubb/audit/record-fields.html

Conclusion
The Dendrogram is a useful tool to show similarity and structure of things. We were able to apply lessons from two previous blogs to produce something new. This can be applied in other ways where its simply easier to collect the right data and shape it during collection to make visualizing easier. Sometimes you may want to do data fusion at collection time to combine external information with the audit events and in that case you can do it at collection time of do an inner join like we did when creating sankey diagrams. Now, go write some neat tools.

Security + Data Science

Friday, May 5, 2017

Audit Record Fields Visualized

No comments:

Blog Archive