Thursday, January 11, 2018

Getting Torch running on F27 with CUDA 9

In a previous post, I showed how to setup Torch on Fedora 25. You may be able to migrate to Fedora 26 with just some rebuilding because libraries have changed, but moving to F27 is a problem. This is because the Nvidia drivers for Xorg were updated to CUDA 9. This update causes all the AI stuff we previously setup to no longer work.

This post is an updated version of the original post. The original post has been copied and corrected. This way if you find this blog for the first time, then you do not need to go back and read the original. We will do a different experiment with the final results, so maybe you do want to read the old article. I'll also include hints about what to do if you are migrating from CUDA 8 to CUDA 9. They will be in italics to distinguish migration steps from fresh install. And if you see any mistakes or omissions, contact me or comment so I can fix it. OK, let's jump into it.

----


In this blog post we will setup the Torch AI framework so that it can be used on Fedora 27 with CUDA 9. This builds on the previous blog post which shows you how to setup a CUDA 9 development environment for Fedora.


Torch
Torch is a Deep Learning AI framework that is written in LUA. This makes it very fast because there is little between the script and the pure C code that is performing the work. Both Facebook and Twitter are major contributors to this and have probably derived their in-house version from the open source version.

The first thing I would do is setup an account just for AI. The reason I suggest this is because we are going to be installing a bunch of software without rpm. All of this will be going into the home directory. So, if one day you want to delete it all, its as simple as deleting the account and home directory. Assuming you made the account and logged into it...

If you are migrating and have a torch directory from F25, go ahead and delete it.

rm -rf torch


Then we do the following:

$ git clone https://github.com/torch/distro.git ~/torch --recursive
$ cd torch/
$ export CMAKE_CXX_FLAGS="-std=c++03"
$ export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
$ ./install.sh


The Torch community say that they only support Torch built this way. I have tried to package Torch in rpm and it simply does not work. I get some strange errors related to math. There are probably compile options that fix this but I'm done with hunting this down. It's easier to use their method from an account just for this. But I digress...

After about 20 minutes, the build asks "Do you want to automatically prepend the Torch install location to PATH and LD_LIBRARY_PATH in your /home/ai/.bashrc? (yes/no)"

I typed "yes" to have it update ~/.bashrc. I logged out and back in. Test to see if the GPU based Torch is working:

luajit -lcutorch
luajit -lcunn


This should produce errors if its not working. To exit the shell, type:

os.exit()


At this point only one last thing is needed. We may want to play with machine vision at some point so get the camera module. And a lot of models seem to be trained using the Caffe Deep Learning framework. This means we need load it from that format so let's grab the loadcaffe module.

During the build of Torch, you got a copy of luarocks which is a package manager for LUA modules. We can use this to pull in the modules so that Torch can use them.

If you do not have an opencv development environment setup:

dnf install opencv-devel

There is a packaging bug in opencv on F27. To fix that, open
/usr/include/opencv/cv.h at line 68, add:

#include "opencv2/videoio/videoio_c.h"

I'll probably file a bz on this to get it corrected. So, in the future you may not need to do this fixup.

$ luarocks install camera
$ luarocks install loadcaffe


Plug in your webcam

If you run the webcam from another account that is not your login account, then you need to go into /etc/group and find the video group and add the ai account as a supplemental group.


Quick Wecam Art Test
OK. Now lets see if Torch is working right. There is a famous project that can take a picture and transfer the artistic style of a work of art onto your picture or a realtime video. Its really quite astonishing to see. Let's use that as our test for Torch.

The project page is here:


https://github.com/jcjohnson/fast-neural-style/

To download it:

$ git clone https://github.com/jcjohnson/fast-neural-style.git


Now download the Caffe models:

$ cd fast-neural-style
$ bash models/download_style_transfer_models.sh

Now it's time to see it work in realtime.

$ qlua webcam_demo.lua -models models/instance_norm/candy.t7 -gpu 0

I won't post a picture or movie. You can see some at the fast neural style project page.


YOLO/Darknet
Another fun program is YOLO/Darknet. It has nothing to do with torch, but since we have the webcam out, let's give it a try. YOLO is an object classification model that runs on darknet. When it sees something that it recognizes, it draws a box around it and labels it. To build it, do the following:

$ cd ~
$ git clone https://github.com/pjreddie/darknet
$ cd darknet
$ vi Makefile


Change the following lines to match this:

GPU=1
CUDNN=1
OPENCV=1


Next build it.
$ make


Now we need a pretrained model:

$ wget https://pjreddie.com/media/files/yolo.weights

And to run the webcam demo:

$ ./darknet detector demo cfg/coco.data cfg/yolo.cfg yolo.weights

Point the webcam at various things and see what it thinks it is.  Again, I won't post a picture or movie here but you can see some at the project page. I will, however, tell an anecdote.

My daughter brought her small dog over to my house. She let it run around. When I was testing this program out, the dog ran past the webcam. If you've been around dogs, you probably know that when dogs are on alert, they keep their tail raised straight up. When the dog ran into the field of view, all you could see was it's tail standing up. YOLO correctly boxed the tail and labeled it "dog". I was impressed.


Live Demo
There is an upcoming Fedora Developer's Conference in Brno, Czech Republic. The first day of the conference there will be an AI table where both of these programs will be demonstrated. If you are there, look for it and say hi.


Conclusion
At some point we may come back to Torch to do some experimenting on security data. But I find it to be fun to play around with the art programs written for it. If you like this, look around. There are a number of apps written for Torch. The main point, though, is to show how to leverage the CUDA development environment we previously setup to get one of the main Deep Learning frameworks installed and running on a Fedora 27 system.

No comments: