Wednesday, July 12, 2017

Interactive R pograms

In the past, we have looked at using R to analyze audit data. The programs are kind of like batch processing. Whatever they do is predefined and you can't tell it to change without modifying the source code. Today we are going to take a look at how to make R application that respond to user input.


Shiny
The developers at RStudio created a way to marry web programming with R so that you have a web presentation layer and an R backend that responds to the changes. This brings a much needed capability because sometimes you want to see the data differently right away.

The shiny interface does bring with it a number of controls like Radio Buttons, drop down text boxes, sliders, charts, and boxes for grouping. You can take a look at a gallery of controls here.

To create a basic shiny app, open RStudio. Click on File|New File and then select "Shiny Web App". That brings up a dialog asking some basic questions. It asks what the application's name is. I put in Test. Then it asks if you want 1 file or 2. I select 1. If you choose 2, then it makes one file for the UI and one file for the back end. The last thing is to select the directory for the file. When you click on Create, it will open a file fully populated with a simple working app.

If you click "Run App", then you should have a program that looks something like this:




Moving the slider causes the histogram to change. Let's look at the code.

library(shiny)

# Define UI for application that draws a histogram
ui <- fluidPage(

   # Application title
   titlePanel("Old Faithful Geyser Data"),

   # Sidebar with a slider input for number of bins
   sidebarLayout(
      sidebarPanel(
         sliderInput("bins",
                     "Number of bins:",
                     min = 1,
                     max = 50,
                     value = 30)
      ),

      # Show a plot of the generated distribution
      mainPanel(
         plotOutput("distPlot")
      )
   )
)

# Define server logic required to draw a histogram
server <- function(input, output) {

   output$distPlot <- renderPlot({
      # generate bins based on input$bins from ui.R
      x    <- faithful[, 2]
      bins <- seq(min(x), max(x), length.out = input$bins + 1)

      # draw the histogram with the specified number of bins
      hist(x, breaks = bins, col = 'darkgray', border = 'white')
   })
}

# Run the application
shinyApp(ui = ui, server = server)



There are 2 parts to this program. The first part is the GUI. There is a call to fluid page that takes an undefined number of arguments that describe the widgets on the page. Each widget is itself a function call that takes parameters or other objects created by other functions. In the basic design, we have a title, a slider, and a plot.

On the server side we have a server object created by a function that has input and output objects. To make the GUI change, we define a distPlot sub-variable to output. We can call this anything. It just has to match what's on the GUI side. This variable is initialized by a renderPlot function which takes a few parameters to describe what to plot. It knows what to plot based on a sub-variable from the input argument, bins. This could be named anything but it has to match what the slide control has or nothing will happen.

The server side and GUI side are tied together with a function call to ShinyApp at the bottom. This is what runs the program. Under the hood, RStudio starts up a little web server that runs a cgi-bin application with an R environment that your app gets loaded into. On the front end it opens a little web browser and connects to the web server on localhost. The cgi-bin starts your session and sends a web page to draw. When you change anything in the web page, it sends a post to the cgi-bin with a new copy of all the variables in the GUI. This  immediately triggers the server code and it responds with an updated web page.

There is a nice and detailed tutorial video created by the RStudio developers if you wanted to learn more. I found it very helpful when learning Shiny. You can also browse around the widget gallery mentioned earlier. In it you can see the source code for all of these little examples.

Now let's do a simple program that does something with audit data. A long time ago, we learned how to do bar charts. That was a pretty simple program. Let's re-fit that code to run as a shiny app so that we tell it how to group the audit data.

library(shiny)
library(ggplot2)

# Read in the data and don't let strings become factors
audit <<- read.csv("~/R/audit-data/audit.csv", header=TRUE, stringsAsFactors = FALSE)
fnames <<- colnames(audit)
fnames[5] <<- "HOUR" # Change serial number to HOUR
audit$one <<- rep(1,nrow(audit))
# Create time series data frame for aggregating
audit$posixDate=as.POSIXct(paste(audit$DATE, audit$TIME), format="%m/%d/%Y %H:%M:%S")
# Create a column of hour and date to aggregate an hourly total.
audit$HOUR <- format(audit$posixDate, format = '%Y-%m-%d %H')
ourColors <<- c("red", "blue", "green", "cyan", "yellow", "orange", "black", "gray", "purple" )

# Define UI for application
ui <- shinyUI(fluidPage(
  # Application title
  titlePanel("Audit Barcharts"),

  sidebarLayout(
    sidebarPanel(
      selectInput("groupBy", "Group By", fnames, selected = "HOUR"),
      selectInput("lowColor", "Low Color", ourColors, selected = "blue"),
      selectInput("highColor", "High Color", ourColors, selected = "red"),
      width = 3
    ),
    # Show a plot of the generated distribution
    mainPanel(
      plotOutput("barPlot", width = "auto", height = "600px"),
      width = 9
    )
  )
))


# Define our server side code

server <- shinyServer(function(input, output) {
  observeEvent(c(input$groupBy, input$lowColor, input$highColor), {
    # Now summarize it
    grp <- input$groupBy

    temp <- aggregate(audit$one, by = audit[grp], FUN = length)
    temp$t <- as.character(temp[,grp])

    if (grp == "HOUR") {
      # Time based needs special handling
      final = data.frame(date=as.POSIXct(temp$t, format="%Y-%m-%d %H", tz="GMT"))
      final$num <- temp$x
      final$day <- weekdays(as.Date(final$date))
      final$oday <- factor(final$day, levels = unique(final$day))
      final$hour <- as.numeric(format(final$date, "%H"))

      output$barPlot<-renderPlot({
        pl <- ggplot(final, aes(x=final[,1], y=final$num, fill=final$num)) +
          geom_bar(stat="identity") + ggtitle(paste("Events by", grp)) +
          scale_x_datetime() + xlab("") + labs(x=grp, y="Number of Events") +
          scale_fill_gradient(low=input$lowColor, high = input$highColor, name=paste("Events/", grp, sep=""))
        print(pl)
      })
    } else {
      # non-time conversion branch
      final <- temp[,1:2]
      colnames(final) = c("factors", "num")
      final$factors <- abbreviate(final$factors, minlength = 20, strict = TRUE)

      # We will rotate based on how dense the labels are
      rot <- 90
      if (nrow(final) < 20)
        rot <- 60
      if (nrow(final) < 10)
        rot <- 45

      # Plot it
      output$barPlot<-renderPlot({
        pl <- ggplot(final, aes(x=final[,1], y=final$num, fill=final$num)) +
          geom_bar(stat="identity") + ggtitle(paste("Events by", grp)) +
          scale_x_discrete() + xlab("") + labs(x=grp, y="Number of Events") +
          scale_fill_gradient(low=input$lowColor, high = input$highColor, name=paste("Events/", grp, sep="")) +
          theme(axis.text.x = element_text(angle = rot, hjust = 1, size = 18))
        print(pl)
      })
    }
  })
})

# Run the application
shinyApp(ui = ui, server = server)



Make sure you have ~/R/audit-data/audit.csv filled with audit data. Save the above code as app.R and run it. You should see something like this:




Also notice that you can change the selection in the text drop downs and the chart is immediately redrawn. Briefly, the way this works is we setup some global data in the R environment. Next we define a GUI that has 3 selector inputs. All of the hard work is in the server function. What it does is wait for either of the 3 variables to change and if so re-draws the screen. We split the charting into 2 branches, time and everything else. The main difference is time variables need special handling. Basically we format the data to what's expected by the plotting function and pass it in. On the non-time side of things, we can get very dense groups. So what we do is rotate the text labels on the bottom if we start running out of room to fit more in.

Conclusion
This shows the basics of how a shiny app works. You can create very elaborate and complicate programs using this API. Now that we've been over Shiny basics, I'll talk about Audit Explorer next time.

Wednesday, July 5, 2017

Getting Torch running on Fedora 25

In this blog post we will setup the Torch AI framework so that it can be used on Fedora. This builds on the previous blog post which shows you how to setup a CUDA development environment for Fedora.


Torch
Torch is a Deep Learning AI framework that is written in LUA. This makes it very fast because there is little between the script and the pure C code that is performing the work. Both Facebook and Twitter are major contributors to this and have probably derived their in-house version from the open source version.

The first thing I would do is setup an account just for AI. The reason I suggest this is because we are going to be installing a bunch of software without rpm. All of this will be going into the home directory. So, if one day you want to delete it all, its as simple as deleting the account and home directory. Assuming you made the account and logged into it...

$ git clone https://github.com/torch/distro.git ~/torch --recursive
$ cd torch/
$ export CMAKE_CXX_FLAGS="-std=c++03"
$ ./install.sh


The Torch community say that they only support Torch built this way. I have tried to package Torch in rpm and it simply does not work. I get some strange errors related to math. There are probably compile options that fix this but I'm done with hunting this down. It's easier to use their method from an account just for this. But I digress...

After about 25 minutes, the build asks "Do you want to automatically prepend the Torch install location to PATH and LD_LIBRARY_PATH in your /home/ai/.bashrc? (yes/no)"

I typed "yes" to have it update ~/.bashrc. I logged out and back in. Test to see if the GPU based Torch is working:

luajit -lcutorch
luajit -lcunn


This should produce errors if its not working. To exit the shell, type:

os.exit()


At this point only one last thing is needed. We may want to play with machine vision at some point so get the camera module. And a lot of models seem to be trained using the Caffe Deep Learning framework. This means we need load it from that format so let's grab the loadcaffe module.

During the build of Torch, you got a copy of luarocks which is a package manager for LUA modules. We can use this to pull in the modules so that Torch can use them.

$ luarocks install camera
$ luarocks install loadcaffe


If you run the webcam from another account that is not your login account, then you need to go into /etc/group and find the video group and add the ai account as a supplemental group.


Quick Art Test
OK. Now lets see if Torch is working right. There is a famous project that can take a picture and transfer the artistic style of a work of art onto your picture. Its really quite astonishing to see. Let's use that as our test for Torch.

The project page is here:

https://github.com/jcjohnson/neural-style


To download it:

$ git clone https://github.com/jcjohnson/neural-style.git


Now download the caffe models:

$ cd neural-style/models
$ sh ./download_models.sh
$ cd ..


We need a picture and a work of art. I have a picture of a circuit board:




Let's see if we can make art from it. The boxiness of the circuit kind of suggests cubism to me. There is a web site called wikiart that curates a collection of art by style and genre. Let's grab a cubist style painting and see how well it works.

$ wget https://uploads7.wikiart.org/images/albert-gleizes/portrait-de-jacques-nayral-1911.jpg
$ mv portrait-de-jacques-nayral-1911.jpg cubist.jpg


To render the art:

$ th neural_style.lua -backend cudnn -style_image cubist.jpg -content_image circuit.jpg -output_image art.jpg


Using a 1050Ti GPU, it takes about 4 minutes and this is the results:




One thing you have to pay attention to is that if the picture is too big, you will run out of GPU memory. The video card only has so much working memory. You can use any image editing tool to re-scale the picture. The number of pixels is what matters rather than the size of the file. Something in the 512 - 1080 pixel range usually fits in a 4Gb video card.


Conclusion
At some point we may come back to Torch to do some experimenting on security data. But I find it to be fun to play around with the art programs written for it. If you like this, look around. There are a number of apps written for Torch. The main point, though, is to show how to leverage the CUDA development environment we previously setup to get one of the main Deep Learning frameworks installed and running on a modern Fedora system.

Thursday, June 29, 2017

Setting up a CUDA development environment on Fedora 25

The aim of this blog is to explore Linux security topics using a data science approach to things. Many people don't like the idea of putting proprietary blobs of code on their nice open source system. But I am pragmatic about things and have to admit that Nvidia is the king of GPU right now. And GPU is the approach to accelerate Deep Learning for the last few years. So, today I'll go over what it takes to correctly setup a CUDA development environment for Fedora 25. This is a continuation of the earlier post about how to get an Nvidia GPU card setup in Fedora. That step is a prerequisite to this blog post.

CUDA
CUDA is the name that NVidia has given to a development environment for creating high performance GPU-accelerated applications. CUDA libraries enable acceleration across multiple domains such as linear algebra, image and video processing, deep learning and graph analytics.These libraries offload work normally done on a CPU to the GPU. And any program created by the CUDA toolkit  is tied to the Nvidia family of GPU's.


Setting it up
The first step is to go get the toolkit. This is not shipped by any distribution. You have to get it directly from Nvidia. You can find the toolkit here:

https://developer.nvidia.com/cuda-downloads

Below is a screenshot of the web site. All the dark boxes are the options that I selected. I like the local rpm option because that installs all CUDA rpms in a local repo that you can then install as you need.



Download it. Even though it says F23, it still works fine on F25.

The day I downloaded it, 8.0.44 was the current release. Today its different. So, I'll continue by using my version numbers and you'll have to make the appropriate substitutions. So, let's continue the setup as root...

rpm -ivh ~/Download/cuda-repo-fedora23-8-0-local-8.0.44-1.x86_64.rpm



This installs a local repo of cuda developer rpms. The repo is located in /var/cuda-repo-8-0-local/. You can list the directory to see all the rpms. Let's install the core libraries that are necessary for Deep Learning:

dnf install /var/cuda-repo-8-0-local/cuda-misc-headers-8-0-8.0.44-1.x86_64.rpm
dnf install /var/cuda-repo-8-0-local/cuda-core-8-0-8.0.44-1.x86_64.rpm
dnf install /var/cuda-repo-8-0-local/cuda-samples-8-0-8.0.44-1.x86_64.rpm


Next, we need to make sure that utilities provided such as the GPU software compiler, nvcc, are in our path and that the libraries can be found. The easiest way to do this by creating a bash profile file that gets included when you start a shell.

edit /etc/profile.d/cuda.sh (which is a new file you are creating now):

export PATH="/usr/local/cuda-8.0/bin${PATH:+:${PATH}}"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
export EXTRA_NVCCFLAGS="-Xcompiler -std=c++03"


The reason CUDA is aimed at F23 rather than 25 is that NVidia is not testing against the newest gcc. So, they put something in the headers to make it fail.

I spoke with people from Nvidia at the GTC conference about why they don't support new gcc. Off the record they said they do extensive testing on everything they support and that its just not something they developed with when creating CUDA 8, but newer gcc will probably be support in CUDA 9.

Its easy enough to fix by altering one line in the header to test for the gcc version. Since we have gcc-6.3, we can fix the header to test for gcc 7 or later and then fail. To do this:

edit /usr/local/cuda-8.0/targets/x86_64-linux/include/host_config.h

On line 119 change from:

#if __GNUC__ > 5

to:

#if __GNUC__ > 6


This will allow things to compile with current gcc. There is one more thing that we need to fix in the headers so that Theano can compile GPU code later. The error looks like this:

math_functions.h(8901): error: cannot overload functions distinguished by return type alone

This is because gcc defines the function also and conflicts with the one NVidia ships. The solution as best I can tell is simply to:

edit /usr/local/cuda-8.0/targets/x86_64-linux/include/math_functions.h

and around lines 8897 and 8901 you will find:

/* GCC 6.1 uses ::isnan(double x) for isnan(double x) */
__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ int isnan(double x) throw();
__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ constexpr bool isnan(long double x);
__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ constexpr bool isinf(float x);
/* GCC 6.1 uses ::isinf(double x) for isinf(double x) */
__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ int isinf(double x) throw();

__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ constexpr bool isinf(long double x);

What I did is to comment out both lines that immediately follow the comment about gcc 6.1.

OK. Next we need to fix the cuda install paths just a bit. As root:

# cd /usr/local/
# ln -s /usr/local/cuda-8.0/targets/x86_64-linux/ cuda
# cd cuda
# ln -s /usr/local/cuda-8.0/targets/x86_64-linux/lib/ lib64



cuDNN setup
One of the goals of this blog is to explore Deep Learning. You will need the cuDNN libraries for that. So, let's put that in place while we are setting up the system. For some reason this is not shipped in an rpm and this leads to a manual installation that I don't like.

You'll need cuDNN version 5. Go to:

https://developer.nvidia.com/cudnn

To get this you have to have a membership in the Nvidia Developer Program. Its free to join.

Look for "Download cuDNN v5 (May 27, 2016), for CUDA 8.0". Get the Linux one. I moved it to /var/cuda-repo-8-0-local. Assuming you did, too...as root:

# cd /var/cuda-repo-8-0-local
# tar -xzvf cudnn-8.0-linux-x64-v5.0-ga.tgz
# cp cuda/include/cudnn.h /usr/local/cuda/include/
# cp cuda/lib64/libcudnn.so.5.0.5 /usr/local/cuda/lib
# cd /usr/local/cuda/lib
# ln -s /usr/local/cuda/lib/libcudnn.so.5.0.5 libcudnn.so.5
# ln -s /usr/local/cuda/lib/libcudnn.so.5.0.5 libcudnn.so



Testing it
To verify setup, we will make some sample program shipped with the toolkit. I had you to install them quite a few steps ago. The following instructions assume that you have used my recipe for a rpm build environment. As a normal user:

cd working/BUILD
mkdir cuda-samples
cd cuda-samples
cp -rp /usr/local/cuda-8.0/samples/* .
make


When its done (and hopefully its successful):

cd 1_Utilities/deviceQuery
./deviceQuery


You should get something like:

  CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1050 Ti"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 4038 MBytes (4234608640 bytes)
  ( 6) Multiprocessors, (128) CUDA Cores/MP:     768 CUDA Cores
  GPU Max Clock rate:                            1468 MHz (1.47 GHz)
  Memory Clock rate:                             3504 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024


<snip>

 You can also check the device bandwidth as follows:

cd ../bandwidthTest
./bandwidthTest



You should see something like:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1050 Ti
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            6354.8

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            6421.6

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            94113.5

Result = PASS


At this point you are done. I will refer back to these instructions in the future. If you see anything wrong or needs updating, please comment on this article.

Wednesday, June 28, 2017

Updated Rstudio srpm available

Due to the unexpected update to R 3.4 on Fedora 25 which is incompatible with the version of RStudio that I wrote about in this blog, I have spent the time to create a new srpm with an updated RStudio which runs on the new R 3.4. The release notes are here:

https://www.rstudio.com/products/rstudio/release-notes/

If you had previously built the version I blogged about, that would correspond with the 0.99a release. So, you can see in the release notes what new things have been added since then.

The source
R-studio-desktop-1.0.146-1.fc25.src.rpm

Building
The build process is very similar to the original instructions. Please review them if you are new to building rpms. In essence you download the srpm. Then:

rpm -ivh R-studio-desktop-1.0.146-1.fc25.src.rpm
rpmbuild -bb working/R-studio-desktop/R-studio-desktop.spec

Then install. This assumes you followed the directory layout suggested in an earlier post.

RStudio picked up one new dependency for qt5-qtwebchannel-devel. You may need to install it first.

This version seems to work with R-3.4 and I've had some time to do limited testing. The only issue I see so far is that audit-explorer (which I'm yet to blog about) seems to have a bug that needs fixing.

One note about R upgrades...you have to re-install all of your packages. So, if you have upgraded R and RStudio, you'll need to start running the install.packages("") command in the console portion of RStudio prior to running any programs.

Tuesday, June 27, 2017

PSA: R3.4 upgrade

If you have built your own version of RStudio from my instructions and srpm, do not upgrade to R 3.4. If you do, you will see a message like this:


R graphics engine version 12 is not supported by this version of RStudio. The Plots tab will be disabled until a newer version of RStudio is installed.

At some point I need to create a newer build of RStudio to take care of this problem. But in the mean time you might want to put an exclude statement in /etc/yum.conf or /etc/dnf/dnf.conf to prevent "R" from updating.

Update June 29, 2017. You can upgrade to the new R 3.4 if you then update your RStudio package as I mention in my next blog post.

Monday, June 26, 2017

Using auparse in python

A while back we took a look at how to write a basic auparse program. The audit libraries have python bindings so that can let you write scripts that do things with audit events. Today, we will take a look at previously given example programs for "C" and see how to recreate them in python. I will avoid the lengthy discussion of the how's and why's from the original article, please refer back to it if explanation is needed.

Now in Python
I was going to publish this blog post about 2 weeks ago. In writing the code, I discovered that the python bindings for auparse had bugs and outright errors in them. These were all corrected in the last release, audit-2.7.7. I held up publishing this to give time for various distributions to get this update pushed out. The following code is not guaranteed to work unless you are on 2.7.7 or later.

We started the article off by showing the basic application construct to loop through all the logs. This is the equivalent of the first example:

#!/usr/bin/env python3

import sys
import auparse
import audit

aup = auparse.AuParser(auparse.AUSOURCE_LOGS);
aup.first_record()
while True:
    while True:
        while True:
            aup.get_field_name()
            if not aup.next_field(): break
        if not aup.next_record(): break
    if not aup.parse_next_event(): break
aup = None
sys.exit(0)


Just as stated in the original article...it's not too useful but it shows the basic structure of how to iterate through logs. We start by importing both audit libraries. Then we call the equivalent of auparse_init which is auparse.AuParser. The auparse state is caught in the variable aup. After that, all functions in auparse are called similarly to the C version except you do not need the auparse_ part of the function name. When done with the state variable, it is destroyed by setting it to None.

Now let's recreate example 2 which is a small program that loops through the logs and prints the record type and the field names contained in each record that follows:

#!/usr/bin/env python3

import sys
import auparse
import audit

aup = auparse.AuParser(auparse.AUSOURCE_LOGS);
aup.first_record()
while True:
    while True:
        mytype = aup.get_type_name()
        print("Record type: %s" % mytype, "- ", end='')
        while True:
            print("%s," % aup.get_field_name(), end='')
            if not aup.next_field(): break
        print("\b")
        if not aup.next_record(): break
    if not aup.parse_next_event(): break
aup = None
sys.exit(0)



I don't think there is anything new to mention here. Running it should give some output such as:

Record type: PROCTITLE - type,proctitle,
Record type: SYSCALL - type,arch,syscall,success,exit,a0,a1,a2,a3,items,ppid,pid,auid,uid,gid,euid,suid,fsuid,egid,sgid,fsgid,tty,ses,comm,exe,subj,key,
Record type: CWD - type,cwd,
Record type: PATH - type,item,name,inode,dev,mode,ouid,ogid,rdev,obj,nametype,
Record type: PROCTITLE - type,proctitle,
Record type: SYSCALL - type,arch,syscall,success,exit,a0,a1,a2,a3,items,ppid,pid,auid,uid,gid,euid,suid,fsuid,egid,sgid,fsgid,tty,ses,comm,exe,subj,key,


Now, let's take a quick look at how to use output from the auparse normalizer. I will not repeat the explanation of how auparse_normalize works. Please refer to the original article for a deeper explanation. The next program takes its input from stdin. So, run ausearch --raw and pipe that into the following program.


#!/usr/bin/env python3

import sys
import auparse
import audit

aup = auparse.AuParser(auparse.AUSOURCE_DESCRIPTOR, 0);
if not aup:
    print("Error initializing")
    sys.exit(1)

while aup.parse_next_event():
    print("---")
    mytype = aup.get_type_name()
    print("event: ", mytype)

    if aup.aup_normalize(auparse.NORM_OPT_NO_ATTRS):
        print("Error normalizing")
        continue

    try:
        evkind = aup.aup_normalize_get_event_kind()
    except RuntimeError:
        evkind = ""
    print("  event-kind:", evkind)

    if aup.aup_normalize_session():
        print("  session:", aup.interpret_field())

    if aup.aup_normalize_subject_primary():
        subj = aup.interpret_field()
        field = aup.get_field_name()
        if subj == "unset":
            subj = "system"
        print("  subject.primary:", field, "=", subj)

    if aup.aup_normalize_subject_secondary():
        subj = aup.interpret_field()
        field = aup.get_field_name()
        print("  subject.secondary:", field, "=", subj)

    try:
        action = aup.aup_normalize_get_action()
    except RuntimeError:
        action = ""
    print("  action:", action)

    if aup.aup_normalize_object_primary():
        field = aup.get_field_name()
        print("  object.primary:", field, "=", aup.interpret_field())

    if aup.aup_normalize_object_secondary():
        field = aup.get_field_name()
        print("  object.secondary:", field, "=", aup.interpret_field())

    try:
        str = aup.aup_normalize_object_kind()
    except RuntimeError:
       str = ""
    print("  object-kind:", str)

    try:
        how = aup.aup_normalize_how()
    except RuntimeError:
        how = ""
    print("  how:", how)

aup = None
sys.exit(0)



There is one thing about the function names that I wanted to point out. The auparse_normalizer functions are all prefixed with aup_. There were some unfortunate naming collisions that necessitated the change in names.

Another thing to notice is that the normalizer metadata functions can throw exceptions. They are always a RuntimeError whenever the function would have returned NULL as a C function. The above program also shows how to read a file from stdin which is descriptor 0. Below is some sample output:

ausearch --start today --raw | ./test3.py

---
event:  SYSCALL
  event-kind: audit-rule
  session: 4
  subject.primary: auid = sgrubb
  subject.secondary: uid = sgrubb
  action: opened-file
  object.primary: name = /etc/audit/auditd.conf
  object-kind: file
  how: /usr/sbin/ausearch
---
event:  SYSCALL
  event-kind: audit-rule
  session: 4
  subject.primary: auid = sgrubb
  subject.secondary: uid = sgrubb
  action: opened-file
  object.primary: name = /etc/audit/auditd.conf
  object-kind: file
  how: /usr/sbin/ausearch



Conclusion
The auparse python bindings can be used whenever you want to manipulate audit data via python. This might be preferable in some cases where you want to create a Jupyter notebook with some reports inside. Another possibility is that you can go straight to Keras, Theano, or TensorFlow in the same application. We will eventually cover machine learning and the audit logs. It'll take some time to get there because there are a lot of prerequisite setups that you would need to do.

Friday, May 26, 2017

Installing a Nvidia Graphics Card on Fedora

So, maybe you have decided to get involved in this new Deep Learning wave of open source projects. The neural networks are kind of slow on a traditional computer. They have to do a lot of matrix math across thousands of neurons.

The traditional CPU is really a latency engine...run everything ASAP. The GPU, on the other hand, is a bandwidth engine. It may be slow getting started but it can far exceed the CPU in parallelism once its running. The typical consumer CPU is 4 cores + hyperthreading which gets you about 8 threads (virtual cores). Meanwhile, an entry level Pascal based GeForce 1050 will give you 768 CUDA cores. Very affordable and only 75 watts of power. You can go bigger but the smallest is huge compared to a CPU.

I've looked around the internet and haven't found good and complete instructions on how to setup for an Nvidia video card on a current version of Fedora. (The instructions at rpmfusion is misleading and old.) So, this post is dedicated to setting up a Fedora 25 system with a recent Nvidia card.

The Setup
With your old card installed and booted up...

1) Blacklist nouveau
# vi /etc/modprobe.d/disable-nouveau.conf
add the next line:
blacklist nouveau

2) Edit boot options
# vi /etc/default/grub
On the GRUB_CMDLINE_LINUX line
add: nomodeset
remove: rhgb
save, exit, and then run either
# grub2-mkconfig -o /boot/grub2/grub.cfg
Or if a UEFI system:
# grub2-mkconfig -o /boot/efi/EFI/<os>/grub.cfg
(Note: <os> should be replaced with redhat, centos, fedora as appropriate.)

3) Setup rpmfusion-nonfree:
# wget https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-25.noarch.rpm
# rpm -ivh rpmfusion-nonfree-release-25.noarch.rpm


4) Enable rpmfusion-nonfree
# vi /etc/yum.repos.d/rpmfusion-nonfree.repo
# vi /etc/yum.repos.d/rpmfusion-nonfree-updates.repo

In each, change to:
enabled=1

4) Update repos
# dnf --refresh check-update

See if new release package
# dnf update rpmfusion-nonfree-release.noarch

5) Start by getting rid of nouveau
# dnf remove xorg-x11-drv-nouveau

6) Install current nvidia drivers:
# dnf install xorg-x11-drv-nvidia-kmodsrc xorg-x11-drv-nvidia xorg-x11-drv-nvidia-libs xorg-x11-drv-nvidia-cuda akmod-nvidia kernel-devel akmod-nvidia --enablerepo=rpmfusion-nonfree-updates-testing

7) Install video accelerators:
# dnf install vdpauinfo libva-vdpau-driver libva-utils

8) Do any other system updates:
# dnf update

9) Shutdown and change out the video card. (Note that shutdown might take a few minutes as akmods is building you a new kernel module for your current kernel.) Reboot and cross your fingers.

Conclusion
This should get you up and running with video acceleration. This is not a CUDA environment for software development. That will require additional steps which involves registering and getting the Nvidia CUDA SDK. I'll leave that for another post when I get closer to doing AI experiments with the audit trail.

Thursday, May 25, 2017

Event overflow during boot

Today I wanted to explain something that I think needs to be corrected in the RHEL 7 DISA STIG. The DISA STIG is a Technical Guide that describes how to securely configure a system. I was looking through its recommendations and saw something in the audit section that ought to be fixed.

BOOT
When the Linux kernel is booted, there are some default values. One of those is the setting for the backlog limit. The backlog limit describes the size of the internal queue that holds events that are destined for the audit daemon. This queue is a holding area so that if the audit daemon is busy and can't get the event right away, they do not get lost. If this queue ever fills up all the way, then we have to make a decision about what to do. The options are ignore it, syslog that we dropped one, or panic the system. This is controlled by the '-f' option to auditctl.

Have you ever thought about the events in the system that are created before the audit daemon runs? Well, it turns out that when a boot is done with audit=1, then the queue is held until the audit daemon connects. After that happens, the audit daemon drains the queue and it functions normally. If the system does not boot with audit=1, then the events are sent to syslog immediately and are not held.

The backlog limit has a default setting of 64. This means that during boot if audit=1, then it will hold 64 records. Let's take a look at how this plays out in real life.

$ ausearch --start boot --just-one
----
time->Wed May 24 06:55:20 2017
node=x2 type=DAEMON_START msg=audit(1495623320.378:4553): op=start ver=2.7.7 format=enriched kernel=4.10.15-200.fc25.x86_64 auid=4294967295 pid=863 uid=0 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=success

This is the event where the audit daemon started logging. In all likelihood the backlog limit got set during the same second. So, let's gather up a log like this:

ausearch --start boot --end 06:55:20 --format csv > ~/R/audit-data/audit.csv

The STIG calls for the backlog limit to be set to 8192. Assuming that we booted with the STIG suggested value, we can take a quick peek inside the csv file to see if 8192 is in the file. It is in my logs. If its not in yours, then increment the --end second by one and re-run. This assumes that you also have '-b 8192' in your audit rules.

What we want to do is create a stacked area graph that plots the cumulative number of events as one plot. This one shows events that are piling up in the backlog queue. We'll color this one red. Then we want to overlay that area graph with one that shows the size of the backlog queue. Its value will be 64 until the 8192 value comes along and then the size is expanded to 8192. We'll color this one blue.

The following R code creates our graph:

library(ggplot2)

# Read in the logs
audit <- read.csv("~/R/audit-data/audit.csv", header=TRUE)

# Create a running total of events
audit$total <- as.numeric(rownames(audit))

# create a column showing backlog size
# Default value is 64 until auditctl resizes it
# We choose a fill of 500 so the Y axiz doesn't make leakage too small
audit$backlog <- rep(64,nrow(audit))
audit$backlog[which(audit$OBJ_PRIME == 8192):nrow(audit)] = 500

# Now create a stacked area graph showing the leakage
plot1 = ggplot(data=audit) +
  geom_area(fill = "red", aes(x=total, y=total, group=1)) +
  geom_area(fill = "blue", aes(x=total, y=backlog, group=2)) +
  labs(x="Event Number", y="Total")

print(plot1)


What this graph will tell us is if we are losing events. If we are not losing events, then the blue area will completely cover the red area. If we are losing events, then we will see some red. Everybody's system is different. You may have SE Linux AVCs or other things that happen which is different than mine. But for my system, I get the following chart:




Looking at it, we do see red area. Guestimating, it appears to drop about 50 events or so. This is clearly a problem if you wanted to have all of your events. So, what do you do about this?

There is another audit related kernel command line variable that you can set that initializes the backlog limit to something other than 64. If you wanted to match the number that the DISA STIG recommends, then on a grub2 system (such as rhel 7), you need to do the following as root:

  1. vi /etc/default/grub
  2. find "GRUB_CMDLINE_LINUX=" and add somewhere within the quotes and probably right after audit=1 audit_backlog_limit=8192. Save and exit.
  3. rebuild grub2 menu with: grub2-mkconfig -o /boot/grub2/grub.cfg. If UEFI system, then: grub2-mkconfig -o /boot/efi/EFI/<os>/grub.cfg  (Note: <os> should be replaced with redhat, centos, fedora as appropriate.)

That should do it.


Conclusion
The DISA STIG is a valuable resource in figuring out how to securely configure your Linux system. But there are places where it could be better. This blog shows that you can lose events if you don't take additional configuration steps to prevent this from happening. If you see "kauditd hold queue overflow" or something like that on your boot screen or in syslog, then be sure to set audit_backlog_limit on the kernel boot prompt.

Tuesday, May 9, 2017

Day 1 at GTC 2017



This is a review of Day 1 of the Nvidia GTC 2017 Conference. Frankly, there is so much going on in GPU and Deep Learning as it relates to every industry, it's crazy...and its infectious. What I'm doing is looking down the road to the next steps for the audit system. What I'm investigating is how best to analyze the logs. How do you weed out the mundane normal system operation from the things that you had better pay attention to. Oh, and in real time. And at scale.

What I can tell you is I'm amazed at all the AI technology on display here. I took labs and built neural networks for data analysis. I can see the future of security situational awareness all around me but its in pieces needing to be assembled for a single purpose. I don't think I'll go too deep in this blog at what I'm seeing. But in the coming months I will be doing some experiments in applying different kinds of analysis to the audit trail. This will include looking at LSTMs, RNNs, and decision trees. What I'll do in this blog is just show you some posters that were in the main hall. All of these caught my eye for problems I'm currently thinking about.













Friday, May 5, 2017

Audit Record Fields Visualized

Before we move on from Dendrograms, I wanted to write about a post that combines what we have learned about the auparse library and R visualizations. Sometimes what you want to do requires writing a helper script that gets exactly the information you want. Then once we have prepared data, we can then take it through visualization. What I want to demonstrate in this post is how to create a graph of the audit record grammar.

Audit Records
The first step is to create a program based on auparse that will iterate over every event, and then over every record, and then over every field. We want to graph this by using a Dendrogram which means that things that are alike share common nodes and things that are different branch away. We want to label each record with its record type. Since we know that every record type is different, it would not be a good idea to start the line off with the record type. We will place it at the end just incase two records are identical except the record type.

Once our program is at the beginning of a record, we will iterate over every field and output its name. Since we know that we need a delimiter for the tree map, we can insert that at the time we create the output. Sound like a plan? Then go ahead and put this into fields-csv.c

#include <stdio.h>
#include <ctype.h>
#include <sys/stat.h>
#include <auparse.h>

static int is_pipe(int fd)
{
    struct stat st;

    if (fstat(fd, &st) == 0) {
        if (S_ISFIFO(st.st_mode))
            return 1;
    }
    return 0;
}

int main(int argc, char *argv[])
{
    auparse_state_t *au;

    if (is_pipe(0))
            au = auparse_init(AUSOURCE_DESCRIPTOR, 0);
    else if (argc == 2)
            au = auparse_init(AUSOURCE_FILE, argv[1]);
    else
            au = auparse_init(AUSOURCE_LOGS, NULL);
    if (au == NULL) {
            printf("Failed to access audit material\n");
            return 1;
    }

    auparse_first_record(au);
    do {
        do {
            int count = 0;
            char buf[32];
            const char *type = auparse_get_type_name(au);
            if (type == NULL) {
                snprintf(buf, sizeof(buf), "%d",
                            auparse_get_type(au));
                type = buf;
            }
            do {
                const char *name;

                count++;
                if (count == 1)
                    continue;
                name = auparse_get_field_name(au);
                if (name[0] == 'a' && isdigit(name[1]))
                    continue;
                if (count == 2)
                    printf("%s", name);
                else
                    printf(",%s", name);
            } while (auparse_next_field(au) > 0);
            printf(",%s\n", type);>
        } while (auparse_next_record(au) > 0);
    } while (auparse_next_event(au) > 0);

    auparse_destroy(au);

    return 0;
}


Then compile it like:

gcc -o fields-csv fields-csv.c -lauparse


Next, let's collect our audit data. We are absolutely going to have duplicate records. So, let's use the sort and uniq shell script tools to winnow out the duplicates.

ausearch --start this-year --raw | ./fields-csv | sort | uniq > ~/R/audit-data/year.csv


Let's go into RStudio and turn this into a chart. If you feel you understand the dendrogram programming from the last post, go ahead and dive into it. The program, when finished, should be 6 actual lines of code. This program will has one additional problem that we have to solve.

The issue is that we did not actually create a normalized csv file where every record has the same number of fields.  What R will do is normalize it so that all rows have the same number of columns. We don't want that. So, what we will do is use the gsub() function to trim the trailing slashes off. Other than that, you'll find the code remarkably similar to the previous blog post.

library(data.tree)
library(networkD3)

# Load in the data
a <- read.csv("~/R/audit-data/year.csv", header=FALSE, stringsAsFactors = FALSE)


# Create a / separated string list which maps the fields
a$pathString <- do.call(paste, c("record", sep="/", a))

# Previous step normalized the path based on record with most fields.
# Need to remove trailing '/' to fix it.
gsub('/+$', '', a$pathString)

# Now convert to tree structure
l <- as.Node(a, pathDelimiter = "/")

# And now as a hierarchial list
b <- ToListExplicit(l, unname = TRUE)

# And visualize it
diagonalNetwork(List = b, fontSize = 12, linkColour = "black", height = 4500, width = 2200)



When I run the program, with my logs I get the following diagram. Its too big to paste into this blog, but just follow the link to see it full sized:

http://people.redhat.com/sgrubb/audit/record-fields.html


Conclusion
The Dendrogram is a useful tool to show similarity and structure of things. We were able to apply lessons from two previous blogs to produce something new. This can be applied in other ways where its simply easier to collect the right data and shape it during collection to make visualizing easier. Sometimes you may want to do data fusion at collection time to combine external information with the audit events and in that case you can do it at collection time of do an inner join like we did when creating sankey diagrams. Now, go write some neat tools.

Thursday, May 4, 2017

Dendrograms

In this post we will resume our exploration of data science by learning a new kind of visualization technique. Sometimes we need to show exact relationships that naturally fall out as a tree. You have a root and branches that show how something is similar and then how it becomes different as you know more. This kind of diagram is known as a Dendrogram.

I See Trees
The linux file system is a great introductory data structure that lends itself to being drawn as a tree because...it is. You have a root directory and then subdirectories and then subdirectories under those. Turning this into a program that draws your directory structure is remarkably simple. First, let's collect some data to explore. I want to choose the directories under the /usr directory but not too many. So, run the following command:

[sgrubb@x2 dendrogram]$ echo "pathString" > ~/R/audit-data/dirs.csv
[sgrubb@x2 dendrogram]$ find /usr -type d | head -20 >> ~/R/audit-data/dirs.csv
[sgrubb@x2 dendrogram]$ cat ~/R/audit-data/dirs.csv
pathString
/usr
/usr/bin
/usr/games
/usr/lib64
/usr/lib64/openvpn
/usr/lib64/openvpn/plugins
/usr/lib64/perl5
/usr/lib64/perl5/PerlIO
/usr/lib64/perl5/machine
/usr/lib64/perl5/vendor_perl
/usr/lib64/perl5/vendor_perl/Compress
/usr/lib64/perl5/vendor_perl/Compress/Raw
/usr/lib64/perl5/vendor_perl/threads
/usr/lib64/perl5/vendor_perl/Params
/usr/lib64/perl5/vendor_perl/Params/Validate
/usr/lib64/perl5/vendor_perl/Digest
/usr/lib64/perl5/vendor_perl/MIME
/usr/lib64/perl5/vendor_perl/Net
/usr/lib64/perl5/vendor_perl/Net/SSLeay
/usr/lib64/perl5/vendor_perl/Scalar



OK. We have our data. It turns out that there is a package in R that is a perfect fit for this kind of data. Its called data.tree. If you do not have that installed into RStudio, go ahead and run

install.packages("data.tree")


The way that this package works is that it wants to see things represented as a string separated by a delimiter. It wants this mapping to be under a column called pathString. So, this is why we created the CSV file the way we did. The paths found by the find command have '/' as a delimiter. So, this makes our program dead simple:


library(data.tree)
library(networkD3)

# Read in the paths
f <- read.csv("~/R/audit-data/dirs.csv", header=TRUE)

# Now convert to tree structure
l <- as.Node(f, pathDelimiter = "/")

# And now as a hierarchial list
b <- ToListExplicit(l, unname = TRUE)

# And visualize it
diagonalNetwork(List = b, fontSize = 10)



On my system you get a picture something like this:




Programming in R is a lot like cheating. (Quote me on that.) Four actual lines of code to produce this. How many lines of C code would it take to do this? We can see grouping of how things are similar by how many nodes they share. Something that is very different branches away on the first node. Things that are closely related share many of the same nodes.

So, lets use this to visualize how closely related some audit events are. Let's collect some audit data:

ausearch --start today --format csv > ~/R/audit-data/audit.csv


What we want to do with this is fashion the data into the same layout that we had with the directory paths. The audit data has 15 columns. To show how events are related to one another we only want the EVENT_KIND and the EVENT columns. So our first step is to create a new dataframe with just those. Next we need to take each row and turn it into a string that has each column delimited by some character. We'll choose '/' again to keep it simple. But this time we need to create our own pathString column and fill it with the string. We will use the paste function to glue it altogether. And the rest of the program is just like the previous one.


library(data.tree)
library(networkD3)

audit <- read.csv("~/R/audit-data/audit.csv", header=TRUE)

# Subset to the fields we need
a <- audit[, c("EVENT_KIND", "EVENT")]

# Prepare to convert from data frame to tree structure by making a map
a$pathString <- paste("report", a$EVENT_KIND, a$EVENT, sep="/")

# Now convert to tree structure
l <- as.Node(a, pathDelimiter = "/")

# And now as a hierarchial list
b <- ToListExplicit(l, unname = TRUE)

# And visualize it
diagonalNetwork(List = b, fontSize = 10)



With my audit logs, I get a picture like this:




You can clearly see the grouping of events that are related to a user login, events related to system services, and Mandatory Access Control (mac) events among others. You can try this on bigger data sets.

So let's do one last dendrogram and call it a day. Using the same audit csv file, suppose you wanted to show user, user session, time of day, and action, can you guess how to do it? Look at the above program. You only have to change 2 lines. Can you guess which ones?


Here they are:

a <- audit[, c("SUBJ_PRIME", "SESSION", "TIME", "ACTION")]
a$pathString <- paste("report", a$SUBJ_PRIME, a$SESSION, a$TIME, a$ACTION, sep="/")


Which yields a picture like this from my data:





Conclusion
The dendrogram is a good diagram to show how things are alike and how they differ. Events tend to differ based on time and this can be useful in showing order of time series data. Creating a dendrogram is dead simple and is typically 6 lines of actual code. This adds one more tool to our toolbox for exploring security data.

Monday, May 1, 2017

Updating R

Its been a while since we talked about R. There is one important point that I wanted to raise since this is a security blog. That is that R and its packages must be updated just like any other package on your Linux system.

How To Update
The R packages use things like curl to pull remote content and thus need maintaining as curl is updated to fix its own CVE's. Some of the R packages are very complex and have several layers of dependencies and pull a lot of source code to recompile. I'll show you the easy way to fix all this.

First, start up Rstudio. Then find the menu item "Tools" and click on it. You will see a menu item that says "Check for package updates".




Click on it and it will "think" for a couple seconds while its fetching update information and comparing with your local repository. When its done, the dialog box will look something like this.




Then you click on "Select All" and then "Install updates" and it will start downloading source. It will recompile the R packages and install them to the runtime R repository off of your home directory. When it finishes, you are done. You can restart Rstudio to reload the packages with new ones and go back to doing data science things.

Conclusion
Its really simple to keep R updated. It has vulnerable packages just like anything else on your system. Occasionally there are feature enhancements. The hardest part is just developing the habit to update R periodically.

Tuesday, April 25, 2017

Aide made easy

The aide program is a good program to use to see if anything important has changed on your system. It works by creating a baseline which at some future point you use to compare with the current system to see what changed. It can track added files, deleted files, and changed files. The changed files it can tell you which attribute changed such as owner, group, other permissions, size, time, extended attributes, or if the file contents changed yielding a new SHA256 hash of the file.

The only wrinkle is that it requires you to actually move the database after you create it. From the command line that is a bit cumbersome because you need to figure out where things go.

I have a couple of little scripts that simplifies using the aide program: aide-init and aide-check. You can install these into /root/bin for ease of use. This is aide-init:

#!/bin/sh
echo "Creating temp copy of aide database"
rm -f /var/lib/aide/*
aide -i
echo "Overwriting stored aide database."
mv /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz
echo "Aide is ready for use."


and aide-check:

#!/bin/sh
aide -C


The only reason I have aide-check is just for symmetry. If have an init, you should have a check.

So, the way to use these is to run aide-init to establish a baseline. Then next time you login, run aide-check to see if anything has changed. If so, investigate. If you are satisfied all changes are explained run aide-init again. Also if you see an update that needs to be installed, immediately do an aide-init after the update so that you have all changes rolled up to the database.

I have one interesting side note on this. I am leasing a VPS system for security research. A VPS system is a Virtual Private Server which is based on container technology. The interesting thing is that I can see when the host updates under me (since I share the base image with other servers).

Anyways...just thought I pass this along in case anyone finds this useful.

Monday, April 24, 2017

Sending email when audisp program sees event

I've been working our way up to writing an audispd plugin that will send an email whenever a specific event occurs. To do this, we will need a plan of action. The way it should work is that we need a way of identifying the event that we are interested in. We will use this to trigger the sending of email.

Needle in a Haystack
The first issue is we need to find the event. The easiest way to do this is to add a key to the audit rule. Then what we can do is write an audispd plugin that will have a callback function that uses the auparse_normalize API to normalize the event. Then we have the subject readily available. The normalizer API also hands us the key without needing to search for it. But that leads us to the question of what are we looking for?

Remember that audisp plugins can take a command line argument. So instead of opening a config file and parsing it, we''ll just ask that whatever the key is be passed as an argument on the command line. And just in case you want to match on multiple but related keys, we'll do substring matching using the strstr function. A more professional program might have a config file with many things or pairs of things to match against before alerting. But I'm going to keep it simple while illustrating the point.

So, the initial code looks like this:

#define _GNU_SOURCE
#include <stdio.h>
#include <sys/select.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <libaudit.h>
#include <auparse.h>

const char *needle = NULL;

// send_alert goes here

static void handle_event(auparse_state_t *au,
        auparse_cb_event_t cb_event_type, void *user_data)
{
    char msg[256], *name = NULL, *key = NULL;
    if (cb_event_type != AUPARSE_CB_EVENT_READY)
        return;

    /* create a message */
    if (!auparse_normalize(au, NORM_OPT_NO_ATTRS)) {
        if (auparse_normalize_key(au) == 1)
            key = auparse_interpret_field(au);
            if (key && strstr(needle, key)) {
                if (auparse_normalize_subject_primary(au) == 1)
                    name = strdup(auparse_interpret_field(au));
                /* send a message */
                printf("Alert, %s triggered our rule\n", name);
                //send_alert(name);
                free(name);
        }
    }
}

int main(int argc, char *argv[])
{
    auparse_state_t *au = NULL;
    char tmp[MAX_AUDIT_MESSAGE_LENGTH+1], bus[32];

    if (argc != 2) {
        fprintf(stderr, "Missing key to look for\n");
        return 1;
    }
    needle = argv[1];

    /* Initialize the auparse library */
    au = auparse_init(AUSOURCE_FEED, 0);
    auparse_add_callback(au, handle_event, NULL, NULL);

    do {
        int retval;
        fd_set read_mask;

        FD_ZERO(&read_mask);
        FD_SET(0, &read_mask);

        do {
            retval = select(1, &read_mask, NULL, NULL, NULL);
        } while (retval == -1 && errno == EINTR);

        /* Now the event loop */
         if (retval > 0) {
            if (fgets_unlocked(tmp, MAX_AUDIT_MESSAGE_LENGTH,
                stdin)) {
                auparse_feed(au, tmp,
                    strnlen(tmp, MAX_AUDIT_MESSAGE_LENGTH));
            }
        } else if (retval == 0)
            auparse_flush_feed(au);
        if (feof(stdin))
            break;
    } while (1);

    /* Flush any accumulated events from queue */
    auparse_flush_feed(au);
    auparse_destroy(au);

    return 0;
}


We can compile it like this:

gcc -o audisp-example4 audisp-example4.c -lauparse -laudit


And let's test it by triggering any time the "w" command is used. Let's add the following audit rule:

auditctl -a always,exit -F path=/usr/bin/w -F perm=x -F key=alert
Then run "w" and let's collect the audit log for testing.

$ w
 16:15:00 up 16 min,  1 user,  load average: 0.34, 0.40, 0.37
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
sgrubb   tty2      16:04   16:04   2:10  49.27s /usr/lib64/firefox/firefox -con

$ ausearch --start recent --raw > test.log


Now we have some data to test the plugin.

$ cat test.log | ./audisp-example4 alert
Alert, sgrubb triggered our rule


Sendmail
The next trick is that we need to send an email notifying that something we're interested in happened. It turns out that there is a program called sendmail which is not exactly the MTA daemon sendmail, but a command line helper program. Its found at /usr/lib/sendmail. Which seems a bit odd but this is the historic place where it was located since the 1990's. I also think all MTA's provide their own stub for compatibility with this ancient standard. Fedora uses postfix, so let's just verify that we have a sendmail stub.

$ file /usr/lib/sendmail
/usr/lib/sendmail: symbolic link to /etc/alternatives/mta-sendmail
$ file /etc/alternatives/mta-sendmail
/etc/alternatives/mta-sendmail: symbolic link to /usr/lib/sendmail.postfix
$ file /usr/lib/sendmail.postfix
/usr/lib/sendmail.postfix: symbolic link to ../sbin/sendmail.postfix
$ file ../sbin/sendmail.postfix
../sbin/sendmail.postfix: cannot open `../sbin/sendmail.postfix' (No such file or directory)
$ file /sbin/sendmail.postfix
/sbin/sendmail.postfix: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=d82eedab7c1ad5c7b01a2a82a3b07919fc1b9089, stripped

OK. There it is. After 4 symlinks we land on the real program.

The way that you use it is to send via stdin an email composed line by line like you would a normal email. You have to have a To, From, Subject, and then the body of the email. It knows the email is finished when it sees a period followed by 2 consecutive newlines. So we might want an email like this:

To: security-team@company
From: root
Subject: Audit Alert
Just wanted to let you know that %s triggered the rule

.



This is pretty much what would show up in your inbox. OK, so let's add this to our program. The trick here is that we open sendmail with popen() writable and then pass our pre-formatted email to it and then close. It will take care of the rest assuming that you substitute a valid email address that is routable for our stand-in on the "To" line.

The code for send alert looks like this:

static void send_alert(const char *name)
{
    FILE *mail;
    mail = popen("/usr/lib/sendmail", "w");
    if (mail) {
        fprintf(mail, "To: %s\n", mail_acct);
        fprintf(mail, "From: root\n");
        fprintf(mail, "Subject: Audit Alert\n\n");
        fprintf(mail, "Just wanted to let you know %s triggered the audit rule\n",
             name);
        fprintf(mail, ".\n\n");         // Close it up...
        fclose(mail);
    }
}


So, copy and paste this into the program above the event_handler. Delete printf in event_handler and uncomment the call to send_alert. You should still be able to test this by cat'ing the test.log file into stdin of the program.

If you are happy with this, then you need to install the program by copying it to /sbin and then setup the configuration file so that audispd starts and passes the right key to the program to look for. The conf file is almost identical to the one from the audispd plugin blog post.

active = yes
direction = out
path = /sbin/audisp-example4
type = always
args = alert
format = string


Copy this to /etc/audisp/plugins.d/audisp-example4.conf and restart the audit daemon. You may need to disable selinux to get this to run. Selinux wraps the audit daemon and anything that is a child of it gets the same type as audit. So, the child is bound by auditd's selinux policy. Auditd can send emails, so there is hope. Ideally you would put this program in either a permissive domain or write a simple policy for it if you used this long term.

To test the program run the "w" program again and see if you get email. A more professional program would also want to build some hysteresis or rate limiting into the program so it cannot flood the recipient of the alert.


Conclusion
Once you have a basic audispd plugin, you can alert in many ways. You might even prefer to create a snmp trap. The beauty of this is that the logs can be tested in realtime as something occurs so that you can react to it right away. The technique that we used in this post was to give the event we are interested in a special key, we filter for that key. When we see our event, we send an email. This is one of the tricks you can use to spot things for alerting.