Ubuntu¶
Documentation in Progress
Check back soon for more updates.
System¶
Edit /etc/hosts
:
127.0.0.1 hostname
Install JDK, Scala and Git:
sudo apt install default-jdk scala git -y
Install Poetry:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3
Install Oracle JDK:
sudo apt update
sudo add-apt-repository ppa:webupd8team/java
sudo apt update
sudo apt install oracle-java8-installer oracle-java8-set-default
Spark¶
This guide provides more information on how to setup Spark on Ubuntu.
Start by downloading Spark 2.4.5 for Hadoop 2.7.
curl -O https://www.apache.org/dyn/closer.lua/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
Extract the archive.
tar xvf spark-2.4.5-bin-hadoop2.7.tgz
Move it to /opt/spark
.
sudo mv spark-2.4.5-bin-hadoop2.7/ /opt/spark
Update the environment variables by adding the following to your shell profile.
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYSPARK_PYTHON=/usr/bin/python3
Alternatively, add it to your profile using echo
.
echo "export SPARK_HOME=/opt/spark" >> ~/.profile
echo "export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin" >> ~/.profile
echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile
This assumes your profile is in .profile
. It may also be in another location like ~/.bashrc
or ~/.zshrc
. Activate your changes as follows.
source ~/.bashrc
Start a stand-alone server.
start-master.sh
The process will listen on 8080.
ss -tunelp | grep 8080
tcp LISTEN 0 1 *:8080
Start a worker process.
start-slave.sh spark://ubuntu:7077
You can stop the processes using the following commands.
stop-slave.sh
stop-master.sh
TensorFlow¶
Ubuntu 18.04 ships with Python 3 by default
sudo apt install python3-venv
Note
If you have a dedicated NVIDIA GPU and want to take advantage of its processing power, instead of tensorflow install the tensorflow-gpu package which includes GPU support.
GPU Support¶
Check the following links to more information on GPU support.
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-430
# Reboot. Check that GPUs are visible using the command: nvidia-smi
# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
cuda-10-1 \
libcudnn7=7.6.4.38-1+cuda10.1 \
libcudnn7-dev=7.6.4.38-1+cuda10.1
# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-dev=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1