Before we move on from Dendrograms, I wanted to write about a post that combines what we have learned about the auparse library and R visualizations. Sometimes what you want to do requires writing a helper script that gets exactly the information you want. Then once we have prepared data, we can then take it through visualization. What I want to demonstrate in this post is how to create a graph of the audit record grammar.
The first step is to create a program based on auparse that will iterate over every event, and then over every record, and then over every field. We want to graph this by using a Dendrogram which means that things that are alike share common nodes and things that are different branch away. We want to label each record with its record type. Since we know that every record type is different, it would not be a good idea to start the line off with the record type. We will place it at the end just incase two records are identical except the record type.
Once our program is at the beginning of a record, we will iterate over every field and output its name. Since we know that we need a delimiter for the tree map, we can insert that at the time we create the output. Sound like a plan? Then go ahead and put this into fields-csv.c
Then compile it like:
Next, let's collect our audit data. We are absolutely going to have duplicate records. So, let's use the sort and uniq shell script tools to winnow out the duplicates.
Let's go into RStudio and turn this into a chart. If you feel you understand the dendrogram programming from the last post, go ahead and dive into it. The program, when finished, should be 6 actual lines of code. This program will has one additional problem that we have to solve.
The issue is that we did not actually create a normalized csv file where every record has the same number of fields. What R will do is normalize it so that all rows have the same number of columns. We don't want that. So, what we will do is use the gsub() function to trim the trailing slashes off. Other than that, you'll find the code remarkably similar to the previous blog post.
When I run the program, with my logs I get the following diagram. Its too big to paste into this blog, but just follow the link to see it full sized:
The Dendrogram is a useful tool to show similarity and structure of things. We were able to apply lessons from two previous blogs to produce something new. This can be applied in other ways where its simply easier to collect the right data and shape it during collection to make visualizing easier. Sometimes you may want to do data fusion at collection time to combine external information with the audit events and in that case you can do it at collection time of do an inner join like we did when creating sankey diagrams. Now, go write some neat tools.