- Build essentials
- Miniconda 3 (or Anaconda)
- Python 3 with pip, numpy, wheel, six etc
- CUDA Toolkit 7.5
- cuDNN Toolkit 7.5
- Java 8
- Bazel
- TensorFlow 0.8
ProTip
I'll recommend installing everything from the defaultubuntu
account unless specificied otherwise (sudo
) as I ran into tons of permission issues, especially with Bazel. And also when creating the AWS EC2 instance increase the disk size from 8 GB to something around 25-30, as you *will* run out of space otherwise. Consider using Miniconda to manage Python 3 packages as I wasted a lots of time trying to get it work with Ubuntu pip/apt-get packages. The default pip/apt-get installs work out fine for Python 2 packages but not for Python 3. Dependencies.
Installing various packages
This is probably the easiest part, update the repo and install everything we need, installed directly.sudo apt-get update sudo apt-get upgrade -y sudo apt-get install -y build-essential git swig default-jdk zip zlib1g-devUbuntu installs Nouveau by default and it seems to have some conflicts when we're trying to install NVIDIA. So we will blacklist Nouveau drivers.
echo -e "blacklist nouveau\nblacklist lbm-nouveau\noptions nouveau modeset=0\nalias nouveau off\nalias lbm-nouveau off\n" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf sudo update-initramfs -uYou can refer to the Ubuntu Manual if you run into any troubles at this point. Now we're going to install extra drivers were that left out of the base kernel package and required by the NVIDIA drivers.
sudo apt-get install -y linux-image-extra-virtualYou might want to
sudo reboot
now to stay on the safe side.
Next up, we'll install latest linux headers for the NVIDIA drivers.
sudo apt-get install -y linux-source linux-headers-`uname -r`If you're getting some local language warning messages, you can fix it by switching back to US English.
export LANGUAGE="en_US.UTF-8" export LANG="en_US.UTF-8" export LC_ALL="en_US.UTF-8" locale-gen "en_US.UTF-8" sudo dpkg-reconfigure localesBefore moving to the next section, another
sudo reboot
would be recommendable.
Installing Miniconda
We will install Miniconda to manage Python 3 packageswget -O ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh bash ~/miniconda.sh -b -p $HOME/miniconda export PATH="$HOME/miniconda/bin:$PATH"We will now install pip numpy packages via Miniconda (
conda
).
conda install python=3.4 pip numpy -yIf you're getting errors that there's no
conda
available, exit the terminal and try again or better yet sudo reboot
if you haven't already.
Mounting Root
To make sure that we don't run out of space when building TensorFlow via Bazel, we are you going to mount the root. Also *don't* place anything important on /mnt as it will not be saved when building an AMI.sudo mkdir /mnt/tmp sudo chmod 777 /mnt/tmp sudo rm -rf /tmp sudo ln -s /mnt/tmp /tmp
Installing CUDA 7.5
Visit CUDA download page and get the "getdeb (network)" version of the installer, other versions might work but I haven't tested them.cd /mnt/tmp wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb sudo apt-get update sudo apt-get install -y cuda sudo modprobe nvidia
Installing cuDNN 7.5
Now NVIDIA doesn't allow us to directly download cuDNN to our machine from their site, you'll getAccess Denied
if you tried to do so. So we will have to download cuDNN from the browser and upload it to Dropbox/Google Drive or somewhere else online. On the download page NVIDIA will ask you to fill up a survery, it's not compulsory - just click the "Proceed To Downloads" button if you are not in a good state of mind to answer a survey rationally, at this point. Upload the file somewhere, get direct link to it and replace DROPBOX/GDRIVE_CUDNN_DOWNLOAD_LINK
with it.
cd /mnt/tmp wget -O "cudnn-7.5-linux-x64-v5.0-rc.tgz" "DROPBOX/GDRIVE_CUDNN_DOWNLOAD_LINK" tar -xzf cudnn-7.5-linux-x64-v5.0-rc.tgz sudo cp /mnt/tmp/cuda/lib64/* /usr/local/cuda/lib64 sudo cp /mnt/tmp/cuda/include/* /usr/local/cuda/include
Installing Java 8
Latest version of Bazel requires Java 8 but as of nowsudo apt-get install openjdk-8-jdk
doesn't seem to lead nowhere. So we will have to get it from some private package.
sudo add-apt-repository ppa:openjdk-r/ppa sudo apt-get update sudo apt-get install -y openjdk-8-jdkNow we have to change default Java version to the latest one we've just installed.
sudo update-alternatives --config java sudo update-alternatives --config javacNow select the latest Java version from the list, usually it's the option #2.
Selection Path Priority Status ------------------------------------------------------------ 0 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java 1071 auto mode 1 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java 1071 manual mode * 2 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1069 manual
Installing Bazel
Bazel is Google's own build tool that will help us compile TensorFlow. Should be straight forward, but be cautious of the permission issues.cd /mnt/tmp git clone https://github.com/bazelbuild/bazel.git cd bazel ./compile.sh sudo cp /mnt/tmp/bazel/output/bazel /usr/bin export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64" export CUDA_HOME=/usr/local/cudaFor some reasons, if you're getting the following error
JDK version (1.7) is lower than 1.8, please set $JAVA_HOME.you can try to explicitly specify the Java path by
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Installing TensorFlow
Finally let's compile and install TensorFlow. This might take a while to finish.cd /mnt/tmp git clone --recurse-submodules https://github.com/tensorflow/tensorflow cd tensorflow TF_UNOFFICIAL_SETTING=1 ./configureExcept for the one below, leave everything to their defaults. AWS requires CUDA version of 3.0, so we'll specify it here adequately. Don't accept the default.
Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: "3.5,5.2"]: 3.0Now let the waiting game begin (and hopefully you won't run into disk space issues).
bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg pip install --upgrade /tmp/tensorflow_pkg/*.whlNow let's check if everything is working fine.
cd /mnt/tmp/tensorflow/tensorflow/models/image/cifar10/ python cifar10_multi_gpu_train.pyI hope you find this useful and helped you save a couple of hours. Please leave me a message if there's any issues.
Troubleshooting
When you're testing TensorFlow, if it says that there's no GPU, try installing linux headers again byapt-get install linux-headers-$(uname -r)Also do the same if you're getting any of the errors below:
cat /proc/driver/nvidia/version cat: /proc/driver/nvidia/version: No such file or directory sudo nvidia-modprobe modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_352' modprobe: ERROR: could not insert 'nvidia_352': Function not implementedIf you're still having issues with NVIDIA drive, please refer to NVIDIA's Documentation
Every time after you restart the machine, you'll have to update the env variables.
export PATH="$HOME/miniconda/bin:$PATH" export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64" export CUDA_HOME=/usr/local/cuda #export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ #you might not need thisOr you can permanently add it to
~/.bashrc
echo "export PATH=$HOME/miniconda/bin:$PATH" >> ~/.bashrc echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64" >> ~/.bashrc echo "export CUDA_HOME=/usr/local/cuda" >> ~/.bashrc #echo "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/" >> ~/.bashrc #you might not need thisIf you didn't increase disk size from default 8GB and run out of space, you can try something alone these lines but at this point I'd rather restart the whole process again.
#mv ~/.cache /mnt/tmp #ln -s /mnt/tmp/.cache ~/ #sudo chown -R ubuntu:ubuntu ~/.cache/bazel/ #rm -rf /home/ubuntu/.cache
References
- http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/
- http://conda.pydata.org/docs/help/silent.html
- http://developer.download.nvidia.com/compute/machine-learning/cudnn/secure/v5/rc/cudnn_install.txt?autho=1461438353_9d110e75f21326804dcc3b7194b8c689&file=cudnn_install.txt
- https://www.tensorflow.org/get_started
- https://devtalk.nvidia.com/default/topic/920308/how-to-install-cuda-7-5-with-the-newest-nvidia-driver-361-28-/
- https://devtalk.nvidia.com/default/topic/884586/linux/failed-to-install-cuda-7-5-in-ubuntu-14-04-lts/
This was very helpful. Thank you very much! Made my setting up of a lot easier. Worked as described.
ReplyDeleteIn the last part of the installation:
`pip install --upgrade /tmp/tensorflow_pkg/*.whl`
I encountered and error related to easy-install of setuptools. Something like:
"Unable to remove non-existing file..."
This was easily fixed by adding `--ignore-installed` flag:
`pip install --ignore-installed --upgrade /tmp/tensorflow_pkg/*.whl`
Very impressive article. I have read each and every point and found it very interesting.
ReplyDeleteY3 games | Y8 | racing games | yepi | Kizi games | Frozen Games,
Friv, Friv 2 Games,