Security + Data Science: July 2017

In the past, we have looked at using R to analyze audit data. The programs are kind of like batch processing. Whatever they do is predefined and you can't tell it to change without modifying the source code. Today we are going to take a look at how to make R application that respond to user input.

Shiny
The developers at RStudio created a way to marry web programming with R so that you have a web presentation layer and an R backend that responds to the changes. This brings a much needed capability because sometimes you want to see the data differently right away.

The shiny interface does bring with it a number of controls like Radio Buttons, drop down text boxes, sliders, charts, and boxes for grouping. You can take a look at a gallery of controls here.

To create a basic shiny app, open RStudio. Click on File|New File and then select "Shiny Web App". That brings up a dialog asking some basic questions. It asks what the application's name is. I put in Test. Then it asks if you want 1 file or 2. I select 1. If you choose 2, then it makes one file for the UI and one file for the back end. The last thing is to select the directory for the file. When you click on Create, it will open a file fully populated with a simple working app.

If you click "Run App", then you should have a program that looks something like this:

Moving the slider causes the histogram to change. Let's look at the code.

library(shiny)

# Define UI for application that draws a histogram
ui <- fluidPage(

   # Application title
   titlePanel("Old Faithful Geyser Data"),

   # Sidebar with a slider input for number of bins
   sidebarLayout(
      sidebarPanel(
         sliderInput("bins",
                     "Number of bins:",
                     min = 1,
                     max = 50,
                     value = 30)
      ),

      # Show a plot of the generated distribution
      mainPanel(
         plotOutput("distPlot")
      )
   )
)

# Define server logic required to draw a histogram
server <- function(input, output) {

   output$distPlot <- renderPlot({
      # generate bins based on input$bins from ui.R
      x    <- faithful[, 2]
      bins <- seq(min(x), max(x), length.out = input$bins + 1)

      # draw the histogram with the specified number of bins
      hist(x, breaks = bins, col = 'darkgray', border = 'white')
   })
}

# Run the application
shinyApp(ui = ui, server = server)

There are 2 parts to this program. The first part is the GUI. There is a call to fluid page that takes an undefined number of arguments that describe the widgets on the page. Each widget is itself a function call that takes parameters or other objects created by other functions. In the basic design, we have a title, a slider, and a plot.

On the server side we have a server object created by a function that has input and output objects. To make the GUI change, we define a distPlot sub-variable to output. We can call this anything. It just has to match what's on the GUI side. This variable is initialized by a renderPlot function which takes a few parameters to describe what to plot. It knows what to plot based on a sub-variable from the input argument, bins. This could be named anything but it has to match what the slide control has or nothing will happen.

The server side and GUI side are tied together with a function call to ShinyApp at the bottom. This is what runs the program. Under the hood, RStudio starts up a little web server that runs a cgi-bin application with an R environment that your app gets loaded into. On the front end it opens a little web browser and connects to the web server on localhost. The cgi-bin starts your session and sends a web page to draw. When you change anything in the web page, it sends a post to the cgi-bin with a new copy of all the variables in the GUI. This immediately triggers the server code and it responds with an updated web page.

There is a nice and detailed tutorial video created by the RStudio developers if you wanted to learn more. I found it very helpful when learning Shiny. You can also browse around the widget gallery mentioned earlier. In it you can see the source code for all of these little examples.

Now let's do a simple program that does something with audit data. A long time ago, we learned how to do bar charts. That was a pretty simple program. Let's re-fit that code to run as a shiny app so that we tell it how to group the audit data.

library(shiny)
library(ggplot2)

# Read in the data and don't let strings become factors
audit <<- read.csv("~/R/audit-data/audit.csv", header=TRUE, stringsAsFactors = FALSE)
fnames <<- colnames(audit)
fnames[5] <<- "HOUR" # Change serial number to HOUR
audit$one <<- rep(1,nrow(audit))
# Create time series data frame for aggregating
audit$posixDate=as.POSIXct(paste(audit$DATE, audit$TIME), format="%m/%d/%Y %H:%M:%S")
# Create a column of hour and date to aggregate an hourly total.
audit$HOUR <- format(audit$posixDate, format = '%Y-%m-%d %H')
ourColors <<- c("red", "blue", "green", "cyan", "yellow", "orange", "black", "gray", "purple" )

# Define UI for application
ui <- shinyUI(fluidPage(
# Application title
titlePanel("Audit Barcharts"),

sidebarLayout(
    sidebarPanel(
      selectInput("groupBy", "Group By", fnames, selected = "HOUR"),
      selectInput("lowColor", "Low Color", ourColors, selected = "blue"),
      selectInput("highColor", "High Color", ourColors, selected = "red"),
      width = 3
    ),
    # Show a plot of the generated distribution
    mainPanel(
      plotOutput("barPlot", width = "auto", height = "600px"),
      width = 9
    )
)
))

# Define our server side code
server <- shinyServer(function(input, output) {
observeEvent(c(input$groupBy, input$lowColor, input$highColor), {
    # Now summarize it
    grp <- input$groupBy

    temp <- aggregate(audit$one, by = audit[grp], FUN = length)
    temp$t <- as.character(temp[,grp])

    if (grp == "HOUR") {
      # Time based needs special handling
      final = data.frame(date=as.POSIXct(temp$t, format="%Y-%m-%d %H", tz="GMT"))
      final$num <- temp$x
      final$day <- weekdays(as.Date(final$date))
      final$oday <- factor(final$day, levels = unique(final$day))
      final$hour <- as.numeric(format(final$date, "%H"))

      output$barPlot<-renderPlot({
        pl <- ggplot(final, aes(x=final[,1], y=final$num, fill=final$num)) +
          geom_bar(stat="identity") + ggtitle(paste("Events by", grp)) +
          scale_x_datetime() + xlab("") + labs(x=grp, y="Number of Events") +
          scale_fill_gradient(low=input$lowColor, high = input$highColor, name=paste("Events/", grp, sep=""))
        print(pl)
      })
    } else {
      # non-time conversion branch
      final <- temp[,1:2]
      colnames(final) = c("factors", "num")
      final$factors <- abbreviate(final$factors, minlength = 20, strict = TRUE)

      # We will rotate based on how dense the labels are
      rot <- 90
      if (nrow(final) < 20)
        rot <- 60
      if (nrow(final) < 10)
        rot <- 45

      # Plot it
      output$barPlot<-renderPlot({
        pl <- ggplot(final, aes(x=final[,1], y=final$num, fill=final$num)) +
          geom_bar(stat="identity") + ggtitle(paste("Events by", grp)) +
          scale_x_discrete() + xlab("") + labs(x=grp, y="Number of Events") +
          scale_fill_gradient(low=input$lowColor, high = input$highColor, name=paste("Events/", grp, sep="")) +
          theme(axis.text.x = element_text(angle = rot, hjust = 1, size = 18))
        print(pl)
      })
    }
})
})

# Run the application
shinyApp(ui = ui, server = server)

Make sure you have ~/R/audit-data/audit.csv filled with audit data. Save the above code as app.R and run it. You should see something like this:

Also notice that you can change the selection in the text drop downs and the chart is immediately redrawn. Briefly, the way this works is we setup some global data in the R environment. Next we define a GUI that has 3 selector inputs. All of the hard work is in the server function. What it does is wait for either of the 3 variables to change and if so re-draws the screen. We split the charting into 2 branches, time and everything else. The main difference is time variables need special handling. Basically we format the data to what's expected by the plotting function and pass it in. On the non-time side of things, we can get very dense groups. So what we do is rotate the text labels on the bottom if we start running out of room to fit more in.

Conclusion
This shows the basics of how a shiny app works. You can create very elaborate and complicate programs using this API. Now that we've been over Shiny basics, I'll talk about Audit Explorer next time.

In this blog post we will setup the Torch AI framework so that it can be used on Fedora. This builds on the previous blog post which shows you how to setup a CUDA development environment for Fedora.

Torch
Torch is a Deep Learning AI framework that is written in LUA. This makes it very fast because there is little between the script and the pure C code that is performing the work. Both Facebook and Twitter are major contributors to this and have probably derived their in-house version from the open source version.

The first thing I would do is setup an account just for AI. The reason I suggest this is because we are going to be installing a bunch of software without rpm. All of this will be going into the home directory. So, if one day you want to delete it all, its as simple as deleting the account and home directory. Assuming you made the account and logged into it...

$ git clone https://github.com/torch/distro.git ~/torch --recursive
$ cd torch/
$ export CMAKE_CXX_FLAGS="-std=c++03"
$ ./install.sh

The Torch community say that they only support Torch built this way. I have tried to package Torch in rpm and it simply does not work. I get some strange errors related to math. There are probably compile options that fix this but I'm done with hunting this down. It's easier to use their method from an account just for this. But I digress...

After about 25 minutes, the build asks "Do you want to automatically prepend the Torch install location to PATH and LD_LIBRARY_PATH in your /home/ai/.bashrc? (yes/no)"

I typed "yes" to have it update ~/.bashrc. I logged out and back in. Test to see if the GPU based Torch is working:

luajit -lcutorch
luajit -lcunn

This should produce errors if its not working. To exit the shell, type:

os.exit()

At this point only one last thing is needed. We may want to play with machine vision at some point so get the camera module. And a lot of models seem to be trained using the Caffe Deep Learning framework. This means we need load it from that format so let's grab the loadcaffe module.

During the build of Torch, you got a copy of luarocks which is a package manager for LUA modules. We can use this to pull in the modules so that Torch can use them.

$ luarocks install camera
$ luarocks install loadcaffe

If you run the webcam from another account that is not your login account, then you need to go into /etc/group and find the video group and add the ai account as a supplemental group.

Quick Art Test
OK. Now lets see if Torch is working right. There is a famous project that can take a picture and transfer the artistic style of a work of art onto your picture. Its really quite astonishing to see. Let's use that as our test for Torch.

The project page is here:

https://github.com/jcjohnson/neural-style

To download it:

$ git clone https://github.com/jcjohnson/neural-style.git

Now download the caffe models:

$ cd neural-style/models
$ sh ./download_models.sh
$ cd ..

We need a picture and a work of art. I have a picture of a circuit board:

Let's see if we can make art from it. The boxiness of the circuit kind of suggests cubism to me. There is a web site called wikiart that curates a collection of art by style and genre. Let's grab a cubist style painting and see how well it works.

$ wget https://uploads7.wikiart.org/images/albert-gleizes/portrait-de-jacques-nayral-1911.jpg
$ mv portrait-de-jacques-nayral-1911.jpg cubist.jpg

To render the art:

$ th neural_style.lua -backend cudnn -style_image cubist.jpg -content_image circuit.jpg -output_image art.jpg

Using a 1050Ti GPU, it takes about 4 minutes and this is the results:

One thing you have to pay attention to is that if the picture is too big, you will run out of GPU memory. The video card only has so much working memory. You can use any image editing tool to re-scale the picture. The number of pixels is what matters rather than the size of the file. Something in the 512 - 1080 pixel range usually fits in a 4Gb video card.

Conclusion
At some point we may come back to Torch to do some experimenting on security data. But I find it to be fun to play around with the art programs written for it. If you like this, look around. There are a number of apps written for Torch. The main point, though, is to show how to leverage the CUDA development environment we previously setup to get one of the main Deep Learning frameworks installed and running on a modern Fedora system.

Security + Data Science

Wednesday, July 12, 2017

Interactive R programs

Wednesday, July 5, 2017

Getting Torch running on Fedora 25

Blog Archive