Thursday, June 29, 2017

Setting up a CUDA development environment on Fedora 25

The aim of this blog is to explore Linux security topics using a data science approach to things. Many people don't like the idea of putting proprietary blobs of code on their nice open source system. But I am pragmatic about things and have to admit that Nvidia is the king of GPU right now. And GPU is the approach to accelerate Deep Learning for the last few years. So, today I'll go over what it takes to correctly setup a CUDA development environment for Fedora 25. This is a continuation of the earlier post about how to get an Nvidia GPU card setup in Fedora. That step is a prerequisite to this blog post.

CUDA
CUDA is the name that NVidia has given to a development environment for creating high performance GPU-accelerated applications. CUDA libraries enable acceleration across multiple domains such as linear algebra, image and video processing, deep learning and graph analytics.These libraries offload work normally done on a CPU to the GPU. And any program created by the CUDA toolkit  is tied to the Nvidia family of GPU's.


Setting it up
The first step is to go get the toolkit. This is not shipped by any distribution. You have to get it directly from Nvidia. You can find the toolkit here:

https://developer.nvidia.com/cuda-downloads

Below is a screenshot of the web site. All the dark boxes are the options that I selected. I like the local rpm option because that installs all CUDA rpms in a local repo that you can then install as you need.



Download it. Even though it says F23, it still works fine on F25.

The day I downloaded it, 8.0.44 was the current release. Today its different. So, I'll continue by using my version numbers and you'll have to make the appropriate substitutions. So, let's continue the setup as root...

rpm -ivh ~/Download/cuda-repo-fedora23-8-0-local-8.0.44-1.x86_64.rpm



This installs a local repo of cuda developer rpms. The repo is located in /var/cuda-repo-8-0-local/. You can list the directory to see all the rpms. Let's install the core libraries that are necessary for Deep Learning:

dnf install /var/cuda-repo-8-0-local/cuda-misc-headers-8-0-8.0.44-1.x86_64.rpm
dnf install /var/cuda-repo-8-0-local/cuda-core-8-0-8.0.44-1.x86_64.rpm
dnf install /var/cuda-repo-8-0-local/cuda-samples-8-0-8.0.44-1.x86_64.rpm


Next, we need to make sure that utilities provided such as the GPU software compiler, nvcc, are in our path and that the libraries can be found. The easiest way to do this by creating a bash profile file that gets included when you start a shell.

edit /etc/profile.d/cuda.sh (which is a new file you are creating now):

export PATH="/usr/local/cuda-8.0/bin${PATH:+:${PATH}}"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
export EXTRA_NVCCFLAGS="-Xcompiler -std=c++03"


The reason CUDA is aimed at F23 rather than 25 is that NVidia is not testing against the newest gcc. So, they put something in the headers to make it fail.

I spoke with people from Nvidia at the GTC conference about why they don't support new gcc. Off the record they said they do extensive testing on everything they support and that its just not something they developed with when creating CUDA 8, but newer gcc will probably be support in CUDA 9.

Its easy enough to fix by altering one line in the header to test for the gcc version. Since we have gcc-6.3, we can fix the header to test for gcc 7 or later and then fail. To do this:

edit /usr/local/cuda-8.0/targets/x86_64-linux/include/host_config.h

On line 119 change from:

#if __GNUC__ > 5

to:

#if __GNUC__ > 6


This will allow things to compile with current gcc. There is one more thing that we need to fix in the headers so that Theano can compile GPU code later. The error looks like this:

math_functions.h(8901): error: cannot overload functions distinguished by return type alone

This is because gcc defines the function also and conflicts with the one NVidia ships. The solution as best I can tell is simply to:

edit /usr/local/cuda-8.0/targets/x86_64-linux/include/math_functions.h

and around lines 8897 and 8901 you will find:

/* GCC 6.1 uses ::isnan(double x) for isnan(double x) */
__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ int isnan(double x) throw();
__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ constexpr bool isnan(long double x);
__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ constexpr bool isinf(float x);
/* GCC 6.1 uses ::isinf(double x) for isinf(double x) */
__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ int isinf(double x) throw();

__DEVICE_FUNCTIONS_DECL__ __cudart_builtin__ constexpr bool isinf(long double x);

What I did is to comment out both lines that immediately follow the comment about gcc 6.1.

OK. Next we need to fix the cuda install paths just a bit. As root:

# cd /usr/local/
# ln -s /usr/local/cuda-8.0/targets/x86_64-linux/ cuda
# cd cuda
# ln -s /usr/local/cuda-8.0/targets/x86_64-linux/lib/ lib64



cuDNN setup
One of the goals of this blog is to explore Deep Learning. You will need the cuDNN libraries for that. So, let's put that in place while we are setting up the system. For some reason this is not shipped in an rpm and this leads to a manual installation that I don't like.

You'll need cuDNN version 5. Go to:

https://developer.nvidia.com/cudnn

To get this you have to have a membership in the Nvidia Developer Program. Its free to join.

Look for "Download cuDNN v5 (May 27, 2016), for CUDA 8.0". Get the Linux one. I moved it to /var/cuda-repo-8-0-local. Assuming you did, too...as root:

# cd /var/cuda-repo-8-0-local
# tar -xzvf cudnn-8.0-linux-x64-v5.0-ga.tgz
# cp cuda/include/cudnn.h /usr/local/cuda/include/
# cp cuda/lib64/libcudnn.so.5.0.5 /usr/local/cuda/lib
# cd /usr/local/cuda/lib
# ln -s /usr/local/cuda/lib/libcudnn.so.5.0.5 libcudnn.so.5
# ln -s /usr/local/cuda/lib/libcudnn.so.5.0.5 libcudnn.so



Testing it
To verify setup, we will make some sample program shipped with the toolkit. I had you to install them quite a few steps ago. The following instructions assume that you have used my recipe for a rpm build environment. As a normal user:

cd working/BUILD
mkdir cuda-samples
cd cuda-samples
cp -rp /usr/local/cuda-8.0/samples/* .
make


When its done (and hopefully its successful):

cd 1_Utilities/deviceQuery
./deviceQuery


You should get something like:

  CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1050 Ti"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 4038 MBytes (4234608640 bytes)
  ( 6) Multiprocessors, (128) CUDA Cores/MP:     768 CUDA Cores
  GPU Max Clock rate:                            1468 MHz (1.47 GHz)
  Memory Clock rate:                             3504 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024


<snip>

 You can also check the device bandwidth as follows:

cd ../bandwidthTest
./bandwidthTest



You should see something like:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1050 Ti
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            6354.8

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            6421.6

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            94113.5

Result = PASS


At this point you are done. I will refer back to these instructions in the future. If you see anything wrong or needs updating, please comment on this article.

Wednesday, June 28, 2017

Updated Rstudio srpm available


Due to the unexpected update to R 3.4 on Fedora 25 which is incompatible with the version of RStudio that I wrote about in this blog, I have spent the time to create a new srpm with an updated RStudio which runs on the new R 3.4. The release notes are here:

https://www.rstudio.com/products/rstudio/release-notes/

If you had previously built the version I blogged about, that would correspond with the 0.99a release. So, you can see in the release notes what new things have been added since then.

The source  (updated 08/11/2017)
https://people.redhat.com/sgrubb/files/Rstudio/

Building
The build process is very similar to the original instructions. Please review them if you are new to building rpms. In essence you download the srpm. Then:

rpm -ivh R-studio-desktop-1.0.146-1.fc25.src.rpm
rpmbuild -bb working/R-studio-desktop/R-studio-desktop.spec

Then install. This assumes you followed the directory layout suggested in an earlier post.

RStudio picked up one new dependency for qt5-qtwebchannel-devel. You may need to install it first.

This version seems to work with R-3.4 and I've had some time to do limited testing. The only issue I see so far is that audit-explorer (which I'm yet to blog about) seems to have a bug that needs fixing.

One note about R upgrades...you have to re-install all of your packages. So, if you have upgraded R and RStudio, you'll need to start running the install.packages("") command in the console portion of RStudio prior to running any programs.

Tuesday, June 27, 2017

PSA: R3.4 upgrade

If you have built your own version of RStudio from my instructions and srpm, do not upgrade to R 3.4. If you do, you will see a message like this:


R graphics engine version 12 is not supported by this version of RStudio. The Plots tab will be disabled until a newer version of RStudio is installed.

At some point I need to create a newer build of RStudio to take care of this problem. But in the mean time you might want to put an exclude statement in /etc/yum.conf or /etc/dnf/dnf.conf to prevent "R" from updating.

Update June 29, 2017. You can upgrade to the new R 3.4 if you then update your RStudio package as I mention in my next blog post.

Monday, June 26, 2017

Using auparse in python

A while back we took a look at how to write a basic auparse program. The audit libraries have python bindings so that can let you write scripts that do things with audit events. Today, we will take a look at previously given example programs for "C" and see how to recreate them in python. I will avoid the lengthy discussion of the how's and why's from the original article, please refer back to it if explanation is needed.

Now in Python
I was going to publish this blog post about 2 weeks ago. In writing the code, I discovered that the python bindings for auparse had bugs and outright errors in them. These were all corrected in the last release, audit-2.7.7. I held up publishing this to give time for various distributions to get this update pushed out. The following code is not guaranteed to work unless you are on 2.7.7 or later.

We started the article off by showing the basic application construct to loop through all the logs. This is the equivalent of the first example:

#!/usr/bin/env python3

import sys
import auparse
import audit

aup = auparse.AuParser(auparse.AUSOURCE_LOGS);
aup.first_record()
while True:
    while True:
        while True:
            aup.get_field_name()
            if not aup.next_field(): break
        if not aup.next_record(): break
    if not aup.parse_next_event(): break
aup = None
sys.exit(0)


Just as stated in the original article...it's not too useful but it shows the basic structure of how to iterate through logs. We start by importing both audit libraries. Then we call the equivalent of auparse_init which is auparse.AuParser. The auparse state is caught in the variable aup. After that, all functions in auparse are called similarly to the C version except you do not need the auparse_ part of the function name. When done with the state variable, it is destroyed by setting it to None.

Now let's recreate example 2 which is a small program that loops through the logs and prints the record type and the field names contained in each record that follows:

#!/usr/bin/env python3

import sys
import auparse
import audit

aup = auparse.AuParser(auparse.AUSOURCE_LOGS);
aup.first_record()
while True:
    while True:
        mytype = aup.get_type_name()
        print("Record type: %s" % mytype, "- ", end='')
        while True:
            print("%s," % aup.get_field_name(), end='')
            if not aup.next_field(): break
        print("\b")
        if not aup.next_record(): break
    if not aup.parse_next_event(): break
aup = None
sys.exit(0)



I don't think there is anything new to mention here. Running it should give some output such as:

Record type: PROCTITLE - type,proctitle,
Record type: SYSCALL - type,arch,syscall,success,exit,a0,a1,a2,a3,items,ppid,pid,auid,uid,gid,euid,suid,fsuid,egid,sgid,fsgid,tty,ses,comm,exe,subj,key,
Record type: CWD - type,cwd,
Record type: PATH - type,item,name,inode,dev,mode,ouid,ogid,rdev,obj,nametype,
Record type: PROCTITLE - type,proctitle,
Record type: SYSCALL - type,arch,syscall,success,exit,a0,a1,a2,a3,items,ppid,pid,auid,uid,gid,euid,suid,fsuid,egid,sgid,fsgid,tty,ses,comm,exe,subj,key,


Now, let's take a quick look at how to use output from the auparse normalizer. I will not repeat the explanation of how auparse_normalize works. Please refer to the original article for a deeper explanation. The next program takes its input from stdin. So, run ausearch --raw and pipe that into the following program.


#!/usr/bin/env python3

import sys
import auparse
import audit

aup = auparse.AuParser(auparse.AUSOURCE_DESCRIPTOR, 0);
if not aup:
    print("Error initializing")
    sys.exit(1)

while aup.parse_next_event():
    print("---")
    mytype = aup.get_type_name()
    print("event: ", mytype)

    if aup.aup_normalize(auparse.NORM_OPT_NO_ATTRS):
        print("Error normalizing")
        continue

    try:
        evkind = aup.aup_normalize_get_event_kind()
    except RuntimeError:
        evkind = ""
    print("  event-kind:", evkind)

    if aup.aup_normalize_session():
        print("  session:", aup.interpret_field())

    if aup.aup_normalize_subject_primary():
        subj = aup.interpret_field()
        field = aup.get_field_name()
        if subj == "unset":
            subj = "system"
        print("  subject.primary:", field, "=", subj)

    if aup.aup_normalize_subject_secondary():
        subj = aup.interpret_field()
        field = aup.get_field_name()
        print("  subject.secondary:", field, "=", subj)

    try:
        action = aup.aup_normalize_get_action()
    except RuntimeError:
        action = ""
    print("  action:", action)

    if aup.aup_normalize_object_primary():
        field = aup.get_field_name()
        print("  object.primary:", field, "=", aup.interpret_field())

    if aup.aup_normalize_object_secondary():
        field = aup.get_field_name()
        print("  object.secondary:", field, "=", aup.interpret_field())

    try:
        str = aup.aup_normalize_object_kind()
    except RuntimeError:
       str = ""
    print("  object-kind:", str)

    try:
        how = aup.aup_normalize_how()
    except RuntimeError:
        how = ""
    print("  how:", how)

aup = None
sys.exit(0)



There is one thing about the function names that I wanted to point out. The auparse_normalizer functions are all prefixed with aup_. There were some unfortunate naming collisions that necessitated the change in names.

Another thing to notice is that the normalizer metadata functions can throw exceptions. They are always a RuntimeError whenever the function would have returned NULL as a C function. The above program also shows how to read a file from stdin which is descriptor 0. Below is some sample output:

ausearch --start today --raw | ./test3.py

---
event:  SYSCALL
  event-kind: audit-rule
  session: 4
  subject.primary: auid = sgrubb
  subject.secondary: uid = sgrubb
  action: opened-file
  object.primary: name = /etc/audit/auditd.conf
  object-kind: file
  how: /usr/sbin/ausearch
---
event:  SYSCALL
  event-kind: audit-rule
  session: 4
  subject.primary: auid = sgrubb
  subject.secondary: uid = sgrubb
  action: opened-file
  object.primary: name = /etc/audit/auditd.conf
  object-kind: file
  how: /usr/sbin/ausearch



Conclusion
The auparse python bindings can be used whenever you want to manipulate audit data via python. This might be preferable in some cases where you want to create a Jupyter notebook with some reports inside. Another possibility is that you can go straight to Keras, Theano, or TensorFlow in the same application. We will eventually cover machine learning and the audit logs. It'll take some time to get there because there are a lot of prerequisite setups that you would need to do.