Monday, February 5, 2018

New Rstudio version 1.1 SRPM available

Today I have uploaded a new RStudio v1.1.423 SRPM for people to build. You can download it from here:
http://people.redhat.com/sgrubb/files/Rstudio/

If you are on Fedora 26 or later, you will need to install the compat-openssl-devel package.


$ dnf install compat-openssl10-devel --allowerasing

This will delete openssl-devel, but you can re-install it later after Rstudio is built. If you are building it for the first time, there are some instructions here.

So, what's new? The folks over at Rstudio blogged about some changes in the 1.1 release. The SRPM that I am providing also includes the bug fixes that are in their v1.1a preview release.

Monday, January 15, 2018

How to train your own fast neural style model

In the last post about getting torch up and running, I introduced you to fast neural style. It's a fun program to play with. But what if you wanted to it do the art of your choosing? This post will show you how to make new models.

The training phase requires a whole lot of data for its input. It will learn how to apply a picture that you provide against a huge database of images. This database of images is roughly 20GB in size.

First, let's install some packages as root that will be used to create the database file. The training program is written to python 2, so we need the python 2 versions.

# dnf install hdf5-devel
# dnf install python2-h5py

Then as your ai user:

$ luarocks install totem
$ luarocks install https://raw.githubusercontent.com/deepmind/torch-hdf5/master/hdf5-0-0.rockspec

Now, we need the training and validation images. The fast-neural-style page says that all training was done using the 2014 COCO dataset. The project's home page is located here:

http://cocodataset.org

In case you wanted to read about their work. The files are large. 18GB together and 20GB uncompressed.

On to our work...

$ cd ~/fast-neural-style
$ mkdir -p data/coco/images/
$ cd data/coco/images
$ wget http://images.cocodataset.org/zips/train2014.zip
$ unzip train2014.zip
$ wget http://images.cocodataset.org/zips/val2014.zip
$ unzip val2014.zip
$ cd ~/fast-neural-style

Next step is to make a hdf5 file out of the training and validation images.

$ python2 scripts/make_style_dataset.py \
  --output_file data/coco/dataset.h5

If you're tight for disk space, you don't need the uncompressed jpg files since they are all in the hdf5 file. You can delete them if you want. The next step is to get a neural net model that we can train.

$ bash models/download_vgg16.sh

One more thing, we need some art. In a previous post about neural style transfer, I pointed the reader to the wikiart web site. It has thousands of pieces of art. For our purposes, we do not need anything high resolution. Anything bigger that 256 pixels in either direction is fine. The art that seems to work the best is art with a very strong style. If you pick something like impression sunrise, it picks up the colors but can't find the style because its too subtle. For this post we will use the following picture:


You can read a little about it here:

https://www.wikiart.org/en/kazuo-nakamura/inner-view-3-1955

Now lets grab it

$ mkdir art
$ cd art
$ wget http://use2-uploads2.wikiart.org/images/kazuo-nakamura/inner-view-3-1955.jpg
$ cd ..

Now we are ready for the main event.

th train.lua \
 -style_image_size 384 \
 -content_weights 1.0 \
 -style_weights 5.0 \
 -checkpoint_name checkpoint \
 -gpu 0 \
 -h5_file data/coco/dataset.h5 \
 -style_image art/inner-view-3-1955.jpg

And...wait...for...it. This...will...take...some...time.

I have a beefy system with a GTX 1080Ti. I used the time command to see how long the training took. This is the results:

real    116m21.518s
user    140m40.911s
sys     41m21.145s


The nvidia-smi program said that this consumed 5.3 GiB of video memory. If you have a 4GiB video board, its might run significantly slower. The 1080Ti has about 3600 cores and the tool said it was using 99% of them. You can estimate the full amount of time by timing the first 1000 iterations and multiply that by 40.

OK. Let's take our new style for a spin.

$ mkdir models/my-styles
$ mv checkpoint.t7 models/my-styles/inner-view-3-1955.t7
$ rm checkpoint.json
$ qlua webcam_demo.lua -gpu 0 -models models/my-styles/inner-view-3-1955.t7





Pretty cool, eh?

One other tip, if you pass a height & width that is too big for your web cam, you will get a warning message and the program will crash. The warning gives you the maximum resolution that your camera supports. For example, I get:

Warning: camera resolution changed to 720x960

When I re-run with that resolution, the image looks much smoother and less pixelated.

Training your own model is not hard. It just takes time. I read in the original paper that the work was supported by nvidia by providing them with the latest hardware. This would have been back around 2016. The paper said that it took them 4 hours to train a model. Have fun playing with this now that you have the recipe.

Thursday, January 11, 2018

Getting Torch running on F27 with CUDA 9

In a previous post, I showed how to setup Torch on Fedora 25. You may be able to migrate to Fedora 26 with just some rebuilding because libraries have changed, but moving to F27 is a problem. This is because the Nvidia drivers for Xorg were updated to CUDA 9. This update causes all the AI stuff we previously setup to no longer work.

This post is an updated version of the original post. The original post has been copied and corrected. This way if you find this blog for the first time, then you do not need to go back and read the original. We will do a different experiment with the final results, so maybe you do want to read the old article. I'll also include hints about what to do if you are migrating from CUDA 8 to CUDA 9. They will be in italics to distinguish migration steps from fresh install. And if you see any mistakes or omissions, contact me or comment so I can fix it. OK, let's jump into it.

----


In this blog post we will setup the Torch AI framework so that it can be used on Fedora 27 with CUDA 9. This builds on the previous blog post which shows you how to setup a CUDA 9 development environment for Fedora.


Torch
Torch is a Deep Learning AI framework that is written in LUA. This makes it very fast because there is little between the script and the pure C code that is performing the work. Both Facebook and Twitter are major contributors to this and have probably derived their in-house version from the open source version.

The first thing I would do is setup an account just for AI. The reason I suggest this is because we are going to be installing a bunch of software without rpm. All of this will be going into the home directory. So, if one day you want to delete it all, its as simple as deleting the account and home directory. Assuming you made the account and logged into it...

If you are migrating and have a torch directory from F25, go ahead and delete it.

rm -rf torch


Then we do the following:

$ git clone https://github.com/torch/distro.git ~/torch --recursive
$ cd torch/
$ export CMAKE_CXX_FLAGS="-std=c++03"
$ export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
$ ./install.sh


The Torch community say that they only support Torch built this way. I have tried to package Torch in rpm and it simply does not work. I get some strange errors related to math. There are probably compile options that fix this but I'm done with hunting this down. It's easier to use their method from an account just for this. But I digress...

After about 20 minutes, the build asks "Do you want to automatically prepend the Torch install location to PATH and LD_LIBRARY_PATH in your /home/ai/.bashrc? (yes/no)"

I typed "yes" to have it update ~/.bashrc. I logged out and back in. Test to see if the GPU based Torch is working:

luajit -lcutorch
luajit -lcunn


This should produce errors if its not working. To exit the shell, type:

os.exit()


At this point only one last thing is needed. We may want to play with machine vision at some point so get the camera module. And a lot of models seem to be trained using the Caffe Deep Learning framework. This means we need load it from that format so let's grab the loadcaffe module.

During the build of Torch, you got a copy of luarocks which is a package manager for LUA modules. We can use this to pull in the modules so that Torch can use them.

If you do not have an opencv development environment setup:

dnf install opencv-devel

There is a packaging bug in opencv on F27. To fix that, open
/usr/include/opencv/cv.h at line 68, add:

#include "opencv2/videoio/videoio_c.h"

I'll probably file a bz on this to get it corrected. So, in the future you may not need to do this fixup.

$ luarocks install camera
$ luarocks install loadcaffe


Plug in your webcam

If you run the webcam from another account that is not your login account, then you need to go into /etc/group and find the video group and add the ai account as a supplemental group.


Quick Wecam Art Test
OK. Now lets see if Torch is working right. There is a famous project that can take a picture and transfer the artistic style of a work of art onto your picture or a realtime video. Its really quite astonishing to see. Let's use that as our test for Torch.

The project page is here:


https://github.com/jcjohnson/fast-neural-style/

To download it:

$ git clone https://github.com/jcjohnson/fast-neural-style.git


Now download the Caffe models:

$ cd fast-neural-style
$ bash models/download_style_transfer_models.sh

Now it's time to see it work in realtime.

$ qlua webcam_demo.lua -models models/instance_norm/candy.t7 -gpu 0

I won't post a picture or movie. You can see some at the fast neural style project page.


YOLO/Darknet
Another fun program is YOLO/Darknet. It has nothing to do with torch, but since we have the webcam out, let's give it a try. YOLO is an object classification model that runs on darknet. When it sees something that it recognizes, it draws a box around it and labels it. To build it, do the following:

$ cd ~
$ git clone https://github.com/pjreddie/darknet
$ cd darknet
$ vi Makefile


Change the following lines to match this:

GPU=1
CUDNN=1
OPENCV=1


Next build it.
$ make


Now we need a pretrained model:

$ wget https://pjreddie.com/media/files/yolo.weights

And to run the webcam demo:

$ ./darknet detector demo cfg/coco.data cfg/yolo.cfg yolo.weights

Point the webcam at various things and see what it thinks it is.  Again, I won't post a picture or movie here but you can see some at the project page. I will, however, tell an anecdote.

My daughter brought her small dog over to my house. She let it run around. When I was testing this program out, the dog ran past the webcam. If you've been around dogs, you probably know that when dogs are on alert, they keep their tail raised straight up. When the dog ran into the field of view, all you could see was it's tail standing up. YOLO correctly boxed the tail and labeled it "dog". I was impressed.


Live Demo
There is an upcoming Fedora Developer's Conference in Brno, Czech Republic. The first day of the conference there will be an AI table where both of these programs will be demonstrated. If you are there, look for it and say hi.


Conclusion
At some point we may come back to Torch to do some experimenting on security data. But I find it to be fun to play around with the art programs written for it. If you like this, look around. There are a number of apps written for Torch. The main point, though, is to show how to leverage the CUDA development environment we previously setup to get one of the main Deep Learning frameworks installed and running on a Fedora 27 system.

Wednesday, January 10, 2018

Setting up CUDA 9 on Fedora 27

In a previous post, I  showed how to setup CUDA 8 on Fedora 25. You can migrate to Fedora 26 OK, but moving to F27 is a problem. This is because the Nvidia drivers for Xorg were updated to CUDA 9. This update causes all the AI stuff we setup to not work any longer.

This post is an updated version of the original post. The original post has been copied and corrected. This way if you find this blog for the first time, then you do not need to go back and read the original. I'll also include hints about what to do if you are migrating from CUDA 8 to CUDA 9. They will be in italics to distinguish migration steps from fresh install. And if you see any mistakes or omissions, contact me or comment so I can fix it. OK, let's jump into it.

----

The aim of this blog is to explore Linux security topics using a data science approach to things. Many people don't like the idea of putting proprietary blobs of code on their nice open source system. But I am pragmatic about things and have to admit that Nvidia is the king of GPU right now. And GPU is the approach to accelerate Deep Learning for the last few years. So, today I'll go over what it takes to correctly setup a CUDA 9 development environment for Fedora 27. This is a continuation of the earlier post about how to get an Nvidia GPU card setup in Fedora. That step is a prerequisite to this blog post.

CUDA
CUDA is the name that NVidia has given to a development environment for creating high performance GPU-accelerated applications. CUDA libraries enable acceleration across multiple domains such as linear algebra, image and video processing, deep learning and graph analytics.These libraries offload work normally done on a CPU to the GPU. And any program created by the CUDA toolkit  is tied to the Nvidia family of GPU's.


Setting it up
The first step is to go get the toolkit. This is not shipped by Fedora. You have to get it directly from Nvidia. You can find the toolkit here:

https://developer.nvidia.com/cuda-downloads

Below is a screenshot of the web site. All the dark boxes are the options that I selected. I like the local rpm option because that installs all CUDA rpms in a local repo that you can then install as you need.



Download it. Even though it says F25, it still works fine on F27.

If you are migrating from the F25 setup for CUDA 8, then you need to get rid of the old CUDA environment. Just uninstalling the license rpm is all that it takes to remove all CUDA rpms.

dnf remove cuda-license-8-0


If this shows that you have a repo installed remove it.

rpm -qa | grep cuda-repo


If not,

rm -rf /var/cuda-repo-8-0-local/


The day I downloaded it, 9.1.85 was the current release. Since you are possibly reading this after its been updated again, you'll have to make the appropriate substitutions. So, let's continue the setup as root...

rpm -ivh cuda-repo-fedora25-9-1-local-9.1.85-1.x86_64.rpm


This installs a local repo of cuda developer rpms. The repo is located in /var/cuda-repo-9-1-local/. You can list the directory to see all the rpms. Let's install the core libraries that are necessary for Deep Learning:

dnf install /var/cuda-repo-9-1-local/cuda-misc-headers-9-1-9.1.85-1.x86_64.rpm
dnf install /var/cuda-repo-9-1-local/cuda-core-9-1-9.1.85-1.x86_64.rpm
dnf install /var/cuda-repo-9-1-local/cuda-samples-9-1-9.1.85-1.x86_64.rpm


Next, we need to make sure that utilities provided such as the GPU software compiler, nvcc, are in our path and that the libraries can be found. The easiest way to do this by creating a bash profile file that gets included when you start a shell.

edit /etc/profile.d/cuda.sh (which is a new file you are creating now):

export PATH="/usr/local/cuda-9.1/bin${PATH:+:${PATH}}"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
export EXTRA_NVCCFLAGS="-Xcompiler -std=c++03"

If you are migrating, the PATH variable just needs changing from 8.0 to 9.1.

The reason CUDA is aimed at F25 rather than 27 is that NVidia is not testing against the newest gcc. So, they put something in the headers to make it fail for environments that are ahead of them. That's the safest thing to do. But we can still get it to work with some effort.

I spoke with people from Nvidia at the GTC conference about why they don't support new gcc. Off the record they said they do extensive testing on everything they support and that its just not something they developed with when creating CUDA 8, but newer gcc will probably be support in CUDA 9. At the time, CUDA 8 was the supported version. They did make good on this. CUDA 9.1 supports up to gcc 6 which fixes a number of problems we used to work around, but gives us new ones.

OK, back to gcc support...it's easy enough to fix by altering one line in the header to test for the gcc version. Since we have gcc-7.2, we can fix the header to test for gcc 7 or later and then fail. To do this:

edit /usr/local/cuda-9.1/targets/x86_64-linux/include/crt/host_config.h

On line 119 change from:

#if __GNUC__ > 6

to:

#if __GNUC__ > 7


This will allow things to compile with current gcc. Next, there is a problem with 128 bit floats...so in /usr/include/bits/floatn.h

Around line 37, add:

#if CUDART_VERSION
#undef __HAVE_FLOAT128
#define __HAVE_FLOAT128 0
#endif


NOTE: This will have to be fixed each time glibc gets updated.

Next, we need the development headers for xorg-x11-drv-nvidia-libs.

# dnf install xorg-x11-drv-nvidia-devel

Next we need to update the cuda paths just a little bit. If you are migrating, get rid of the cuda symlink:

# cd /usr/local/
# rm cuda


Then make a new one.

# cd /usr/local/
# ln -s /usr/local/cuda-9.1/targets/x86_64-linux/ cuda
# cd cuda
# ln -s /usr/local/cuda-9.1/targets/x86_64-linux/lib/ lib64




cuDNN setup
One of the goals of this blog is to explore Deep Learning. You will need the cuDNN libraries for that. So, let's put that in place while we are setting up the system. For some reason this is not shipped in an rpm and this leads to a manual installation that I don't like.

You'll need cuDNN version 5. (Yes, it's ancient, but torch and others have not migrated to a new version.) Go to:

https://developer.nvidia.com/cudnn

To get this you have to have a membership in the Nvidia Developer Program. Its free to join.

Go to archives at the bottom and open 'Download cuDNN v5 (May 27, 2016), for CUDA 8.0'. Then click on 'get cuDNN v5 Library for Linux'. This should start the download.

I moved it to /var/cuda-repo-9-1-local. Assuming you did, too...as root:
# cd /var/cuda-repo-9-1-local
# tar -xzvf ~/cudnn-8.0-linux-x64-v5.0-ga.tgz
# cp cuda/include/cudnn.h /usr/local/cuda/include/
# cp cuda/lib64/libcudnn.so.5.0.5 /usr/local/cuda/lib
# cd /usr/local/cuda/lib
# ln -s /usr/local/cuda/lib/libcudnn.so.5.0.5 libcudnn.so.5
# ln -s /usr/local/cuda/lib/libcudnn.so.5 libcudnn.so



Testing it
To verify setup, we will make some sample program shipped with the toolkit. I had you to install them quite a few steps ago. The following instructions assume that you have used my recipe for a rpm build environment. As a normal user:

cd working/BUILD
mkdir cuda-samples
cd cuda-samples
cp -rp /usr/local/cuda-9.1/samples/* .
make -j 8


When its done (and hopefully its successful):

cd 1_Utilities/deviceQuery
./deviceQuery


You should get something like:


deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11172 MBytes (11714691072 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1671 MHz (1.67 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024


<snip>


 You can also check the device bandwidth as follows:

cd ../bandwidthTest
./bandwidthTest



You should see something like:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1080 Ti
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            6247.0

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            6416.3

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            344483.6

Result = PASS
 


At this point you are done. I will refer back to these instructions in the future. If you see anything wrong or needs updating, please comment on this article.