Skip to content

Windows

Documentation in Progress

Check back soon for more updates.

Overview

There are multiple options available for installing Spark, Hadoop, TensorFlow, and other Big Data/Deep Learning software on Windows 10. While it is possible to install these packages and use these packages on Windows, I strongly urge you to heed the warning of François Chollet, author of Deep Learning With Python.

Whether you’re running locally or in the cloud, it’s better to be using a Unix workstation. Although it’s technically possible to use Keras on Windows (all three Keras backends support Windows), We don’t recommend it. In the installation instructions in appendix A, we’ll consider an Ubuntu machine. If you’re a Windows user, the simplest solution to get everything running is to set up an Ubuntu dual boot on your machine. It may seem like a hassle, but using Ubuntu will save you a lot of time and trouble in the long run.

Prerequisites

Install this software prior to setting up your environment.

Clone GitHub Repository

Using GitHub Desktop, clone the bellevue-university/dsc650 repository by going to https://github.com/bellevue-university/dsc650 and selecting the Open In Desktop option.

Clone DSC 650

Clone the repository to your local system by selecting the appropriate local directory. After selecting the directory, you will see a screen that shows the repository cloning to your local directory. This process may take a long time (minutes to hours), so wait until it is completed.

Select Local Repo

Cloning

Import Environment

Next, open Anaconda Navigator.

Anaconda Navigator

Select the channels option to add the conda-forge channel.

Channels

Add Conda Forge

After you have finished adding the Conda Forge channel, import a new environment by selecting the environments tab and the import option.

Environments

Import the environment.yaml file from the dsc650 repository to the dsc650 environment. This will create an Anaconda environment with the appropriate dependencies.

Open PyCharm Project

Open PyCharm where you should see a screen welcoming you to Pycharm.

Welcome to PyCharm

Select the open option and open the directory where you cloned the dsc650 repository.

Open

When you initially open the project, it may ask you to fix issues with Windows Defender. Fix the issues by clicking the fix option and following the prompts.

Initial PyCharm

Fix Windows Defender

PyCharm should automatically use the previously created dsc650 environment. If not, go to the project-interpreter option in the menu and add the Conda environment.

Settings

Add Environment

Set Sources Root

After opening the project, right click the dsc650 directory and add it as a sources root.

Sources Root

Set Java Home

Variable Value
JAVA_HOME C:\Program Files\Java\jdk1.8.0_251

Go the edit system environment variables in your control panel.

Environment Variable Menu

Under System Properties -> Advanced select Environment Variables.

System Properties

Change the environment variables for your user.

Environment Variables

Exit out of PyCharm and re-open to ensure it sets the environment variables.

Run Examples

Run the TensorFlow example.

Mnist Example

Run the PySpark example.

Pi

Or you can run the PySpark example in the terminal.

Pi

Package Manager (Optional)

If you are using Windows as your primary development environment, I recommend using a package manager like Chocolatey.

A package manager is a tool that automates the process of installing, updating, configuring, and removing computer programs. Package managers are commonly used on Unix and Linux distributions. Debian Linux systems, like Ubuntu, use aptitude. Red Hat and Fedora systems use yum. MacPorts and homebrew are two popular package managers for macOS.

Follow the Chocolatey installation guide to install the package manager on your system. Once you have completed installing the package manager, you can install new software by running PowerShell as an administrator and using the choco command. For example, the following commands will install the latest versions of Adobe Acrobat Reader, Google Chrome, and FireFox on your system.

choco install adobereader
choco install googlechrome
choco install firefox

You can upgrade all packages using choco upgrade all or upgrade individual packages using choco upgrade firefox. Similarly, you can uninstall packages using choco uninstall.

The following is a table of software you might find useful for this course.

Software Package Name
Anaconda Distribution (Python 3.x) anaconda3
Git (Install) git.install
GitHub Desktop github-desktop
Graphviz graphviz
Hadoop hadoop
Java Development Kit 8 jdk8
JetBrains Toolbox App jetbrainstoolbox
JetBrains DataGrip datagrip
JetBrains PyCharm pycharm
JetBrains PyCharm Educational pycharm-edu
JetBrains PyCharm (Community Edition) 1 pycharm-community
MikTeX miktex
Pandoc pandoc
Pandoc CrossRef pandoc-crossref
PostgreSQL postgresql
Protocol Buffers protoc
Scala scala
VirtualBox virtualbox

If you are interested to see what other packages are available, see Chocolatey packages for a list of community maintained packages.


  1. While you can use the community version of PyCharm, JetBrains offers free educational licenses for students and teachers. See educational licenses for more details. 


Last update: March 12, 2023