Wednesday, January 10, 2018

Setting up CUDA 9 on Fedora 27

In a previous post, I  showed how to setup CUDA 8 on Fedora 25. You can migrate to Fedora 26 OK, but moving to F27 is a problem. This is because the Nvidia drivers for Xorg were updated to CUDA 9. This update causes all the AI stuff we setup to not work any longer.

This post is an updated version of the original post. The original post has been copied and corrected. This way if you find this blog for the first time, then you do not need to go back and read the original. I'll also include hints about what to do if you are migrating from CUDA 8 to CUDA 9. They will be in italics to distinguish migration steps from fresh install. And if you see any mistakes or omissions, contact me or comment so I can fix it. OK, let's jump into it.

----

The aim of this blog is to explore Linux security topics using a data science approach to things. Many people don't like the idea of putting proprietary blobs of code on their nice open source system. But I am pragmatic about things and have to admit that Nvidia is the king of GPU right now. And GPU is the approach to accelerate Deep Learning for the last few years. So, today I'll go over what it takes to correctly setup a CUDA 9 development environment for Fedora 27. This is a continuation of the earlier post about how to get an Nvidia GPU card setup in Fedora. That step is a prerequisite to this blog post.

CUDA
CUDA is the name that NVidia has given to a development environment for creating high performance GPU-accelerated applications. CUDA libraries enable acceleration across multiple domains such as linear algebra, image and video processing, deep learning and graph analytics.These libraries offload work normally done on a CPU to the GPU. And any program created by the CUDA toolkit  is tied to the Nvidia family of GPU's.


Setting it up
The first step is to go get the toolkit. This is not shipped by Fedora. You have to get it directly from Nvidia. You can find the toolkit here:

https://developer.nvidia.com/cuda-downloads

Below is a screenshot of the web site. All the dark boxes are the options that I selected. I like the local rpm option because that installs all CUDA rpms in a local repo that you can then install as you need.



Download it. Even though it says F25, it still works fine on F27.

If you are migrating from the F25 setup for CUDA 8, then you need to get rid of the old CUDA environment. Just uninstalling the license rpm is all that it takes to remove all CUDA rpms.

dnf remove cuda-license-8-0


If this shows that you have a repo installed remove it.

rpm -qa | grep cuda-repo


If not,

rm -rf /var/cuda-repo-8-0-local/


The day I downloaded it, 9.1.85 was the current release. Since you are possibly reading this after its been updated again, you'll have to make the appropriate substitutions. So, let's continue the setup as root...

rpm -ivh cuda-repo-fedora25-9-1-local-9.1.85-1.x86_64.rpm


This installs a local repo of cuda developer rpms. The repo is located in /var/cuda-repo-9-1-local/. You can list the directory to see all the rpms. Let's install the core libraries that are necessary for Deep Learning:

dnf install /var/cuda-repo-9-1-local/cuda-misc-headers-9-1-9.1.85-1.x86_64.rpm
dnf install /var/cuda-repo-9-1-local/cuda-core-9-1-9.1.85-1.x86_64.rpm
dnf install /var/cuda-repo-9-1-local/cuda-samples-9-1-9.1.85-1.x86_64.rpm


Next, we need to make sure that utilities provided such as the GPU software compiler, nvcc, are in our path and that the libraries can be found. The easiest way to do this by creating a bash profile file that gets included when you start a shell.

edit /etc/profile.d/cuda.sh (which is a new file you are creating now):

export PATH="/usr/local/cuda-9.1/bin${PATH:+:${PATH}}"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
export EXTRA_NVCCFLAGS="-Xcompiler -std=c++03"

If you are migrating, the PATH variable just needs changing from 8.0 to 9.1.

The reason CUDA is aimed at F25 rather than 27 is that NVidia is not testing against the newest gcc. So, they put something in the headers to make it fail for environments that are ahead of them. That's the safest thing to do. But we can still get it to work with some effort.

I spoke with people from Nvidia at the GTC conference about why they don't support new gcc. Off the record they said they do extensive testing on everything they support and that its just not something they developed with when creating CUDA 8, but newer gcc will probably be support in CUDA 9. At the time, CUDA 8 was the supported version. They did make good on this. CUDA 9.1 supports up to gcc 6 which fixes a number of problems we used to work around, but gives us new ones.

OK, back to gcc support...it's easy enough to fix by altering one line in the header to test for the gcc version. Since we have gcc-7.2, we can fix the header to test for gcc 7 or later and then fail. To do this:

edit /usr/local/cuda-9.1/targets/x86_64-linux/include/crt/host_config.h

On line 119 change from:

#if __GNUC__ > 6

to:

#if __GNUC__ > 7


This will allow things to compile with current gcc. Next, there is a problem with 128 bit floats...so in /usr/include/bits/floatn.h

Around line 37, add:

#if CUDART_VERSION
#undef __HAVE_FLOAT128
#define __HAVE_FLOAT128 0
#endif


NOTE: This will have to be fixed each time glibc gets updated.

Next, we need the development headers for xorg-x11-drv-nvidia-libs.

# dnf install xorg-x11-drv-nvidia-devel

Next we need to update the cuda paths just a little bit. If you are migrating, get rid of the cuda symlink:

# cd /usr/local/
# rm cuda


Then make a new one.

# cd /usr/local/
# ln -s /usr/local/cuda-9.1/targets/x86_64-linux/ cuda
# cd cuda
# ln -s /usr/local/cuda-9.1/targets/x86_64-linux/lib/ lib64




cuDNN setup
One of the goals of this blog is to explore Deep Learning. You will need the cuDNN libraries for that. So, let's put that in place while we are setting up the system. For some reason this is not shipped in an rpm and this leads to a manual installation that I don't like.

You'll need cuDNN version 5. (Yes, it's ancient, but torch and others have not migrated to a new version.) Go to:

https://developer.nvidia.com/cudnn

To get this you have to have a membership in the Nvidia Developer Program. Its free to join.

Go to archives at the bottom and open 'Download cuDNN v5 (May 27, 2016), for CUDA 8.0'. Then click on 'get cuDNN v5 Library for Linux'. This should start the download.

I moved it to /var/cuda-repo-9-1-local. Assuming you did, too...as root:
# cd /var/cuda-repo-9-1-local
# tar -xzvf ~/cudnn-8.0-linux-x64-v5.0-ga.tgz
# cp cuda/include/cudnn.h /usr/local/cuda/include/
# cp cuda/lib64/libcudnn.so.5.0.5 /usr/local/cuda/lib
# cd /usr/local/cuda/lib
# ln -s /usr/local/cuda/lib/libcudnn.so.5.0.5 libcudnn.so.5
# ln -s /usr/local/cuda/lib/libcudnn.so.5 libcudnn.so



Testing it
To verify setup, we will make some sample program shipped with the toolkit. I had you to install them quite a few steps ago. The following instructions assume that you have used my recipe for a rpm build environment. As a normal user:

cd working/BUILD
mkdir cuda-samples
cd cuda-samples
cp -rp /usr/local/cuda-9.1/samples/* .
make -j 8


When its done (and hopefully its successful):

cd 1_Utilities/deviceQuery
./deviceQuery


You should get something like:


deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11172 MBytes (11714691072 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1671 MHz (1.67 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024


<snip>


 You can also check the device bandwidth as follows:

cd ../bandwidthTest
./bandwidthTest



You should see something like:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1080 Ti
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            6247.0

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            6416.3

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432            344483.6

Result = PASS
 


At this point you are done. I will refer back to these instructions in the future. If you see anything wrong or needs updating, please comment on this article.

No comments: