Development environment for machine learning

One of the first problems faced by AI students is how to build a development environment for machine learning. This is an ungrateful issue because there are many methods and tools available and sometimes you simply don’t know which to choose and where to start. Added to this are the issues of choosing libraries for installation, IDE and GPU usage.

Personally, for the purposes of learning and experimenting, I am in favor of using the set of packages offered by Anaconda, conda manager and Jupyter notebook.

From this post you will learn:

  • What is Anaconda, conda and what are the alternatives?
  • How to install Anaconda?
  • How to create virtual development environment for machine learning using conda?
  • What packages to install to get started with machine learning?
  • How to install Tensorflow library in different versions?
  • How to install Tensorflow on GPU?

Differences between Anaconda, conda and pip

Anaconda is an open-source platform for machine learning and data science, available on most operating systems. It contains a set of over 1,500 programming and tool libraries, including conda and pip, which will be useful for building our environments. With Anaconda we have everything at hand, the construction of an environment is quick and simple, and the platform continues to maintain libraries and dependencies between them. Unfortunately, it also has its drawbacks: more advanced features are paid, and the installation of Tensorflow 2.0 is not yet supported (October 2019) by conda and you need to use the pip manager.

Conda is a manager of virtual environments and programming libraries for Python (but also other languages). It is supplied with Anaconda and you can create separate environments in which we will use different libraries or different versions, e.g. tensorflow 1.x, tensorflow 2.x, tensorflow-gpu, etc.

Pip is also a library / programming package manager. However, it does not provide the ability to create separate environments. If someone would like to use pip and still operate in a virtualized environment, then the virtual environment manager for Python – venv – can be used. A certain advantage of pip over conda (at least for October 2019) is that tensorflow 2.0 is available from the pip level and is even the default version. While in conda there is no support for tensorflow 2.0 yet and if you want to build an environment based on version 2.0, then you need a little odd tricks, which is discussed below.

It is worth adding here that there are plenty of great Python environments available in the cloud – just search for “cloud python ide”. In addition, if someone, like me, likes Jupyter Notebook, Google offers the Colaboratory, which is nothing more than a free Jupyter notebook running in the cloud, configured to work or requiring only minimal setup. And what’s interesting, it also allows you to use gpu for free. 

Despite this, I think that the local development environment for machine learning is always worth having, so let’s move on to its configuration.

Development environment for machine learning- step no 1

We download and install Anaconda for Python 3.7. Installation is carried out with the default settings. Only this screen may raise a question:

Building development environment for machine learning using Anaconda

As you can see Anaconda does not recommend adding a path to the PATH variable. Consequently, to run e.g. conda we will have to go to the folder where we install Anaconda, to the condabin subdirectory or use the Anaconda Prompt program (for Windows). Not adding Anaconda directory to the PATH isn’t recommended, cause in more complex environments adding an Anaconda path to PATH can cause conflicts. Hence, I leave the recommended settings and start the installation.

Development environment for machine learning – step no 2

We run the console – in Windows the cmd command, go to the condabin directory (alternatively: run Anaconda Prompt) and check conda version:

>conda -V

If we want to make sure that we are using the latest version of conda, we can run the following command:

>conda update -n base -c defaults conda

Let’s create a new virtual environment. I call it my_env here, but the name can be any:

>conda create --name my_env

To start working with the newly created environment, we run the following command:

>conda activate my_env

Switching to this environment is important, otherwise we will operate in the context of the base environment and there will be no environment separation. To see what packages are installed on the current environment, we issue the command:

> conda list

Since we did not indicate any packages or libraries when creating the environment, the list should be empty at the moment. To install packages we can use the command:

>conda install numpy pandas matplotlib pillow jupyter

We have listed only a few key packages for us, but conda, by examining dependencies, will install many more, including python in the appropriate version. This is one of the biggest advantages of managers such as conda or pip. For all those who would like to read a little more about the possibilities of conda, I recommend this Conda Cheet Sheet website.

At this point, we can stop for a moment with further installation and think about how you can install tensorflow:

  1. If we want one of the lastest stable 1.x versions, then we may install it using conda – this is the most recommended way, because our environment will still be managed by only one package manager – conda.
  2. If we would like to install version 2.0 today (October 2019), unfortunately it is not yet offered by conda and we need to use the pip manager.
  3. There is yet another situation if we can use a GPU graphics processor. 

To easily pass each road, we’ll clone now our environment: we deactivate it and use the clone option:

>conda deactivate
>conda create -n my_env-20 --clone my_env
>conda create -n my_env-gpu --clone my_env

As a result, we have three twin environments: my_env, my_env-20, my_env-gpu and separately on each of them we may proceed with the above mentioned installation types. Simply switching between those environments with deactivateactivate commands.

Development environment for machine learning – step no 3A

We activate the my_env environment and install tensorflow in the latest version available in the Anaconda repository:

>conda activate my_env
>conda install tensorflow

If we want to check in what version Python has been installed:

>python --version
>>>Python 3.7.4

If we want to see the tensorflow version, I suggest you do it from the Jupyter Notebook level (by the way we will check if it works correctly):

>jupyter notebook
# And in a notebook:

import tensorflow as tf
>>> 1.14.0

Development environment for machine learning – step no 3B

Let’s switch to the my_env-20 environment and try to install tensorflow version 2.0. Unfortunately, in this situation we cannot use conda, because for today (October 2019) it does not have this version of tensorflow in its repository. Another package manager – pip – then comes into play.

>conda deactivate
>conda activate my_env-20
>conda install pip
>>># All requested packages already installed.

As you can see pip is already installed – it’s because it was defined as one of the dependencies when installing the base packages. Install Tensorflow using pip:

>pip install tensorflow
>python --version
>python -c "import tensorflow as tf; print(tf.__version__);"

As a result, we will get an environment with Python version 3.7.4 and tensorflow version 2.0.0. Unfortunately, it will not be an ideal environment. In particular, the whole idea of building an environment using conda assumes that this package manager tracks installations and all dependencies, thanks to which the environment is consistent and can be supplemented and updated at any time. When an additional pip enters the stage, conda partially loses this information and also loses control over the environment. The general rule in such situations is that installations using pip should be carried out at the very end and after pip installation no further installations using conda should be performed, otherwise the environment may become unstable. More about potential problems of this configuration and good practices in this area in this article.

I will add that one more important difference between tensorflow installed using conda vs. pip is such that tensorflow from conda can be up to 8 times more efficient due to the way the package is built in the Anaconda repository.

Development environment for machine learning – step no 3C

The last option for building your development environment for machine learning is to configure it so that you can use the power of the GPU, which dramatically speeds up training process. To start with, the main issue. Only those who have a CUDA-enabled graphics card can use the GPU. You can check it at this address. Note: many people also check here, but this list is not valid for today (October 2019). For example, my GeForce GTX 1660 Ti graphics card is not on it, and it can undoubtedly be used by tensorflow.

The installation with the use of conda is currently very simple and in no way resembles a very complicated installation process from a few months ago. Mainly because while installing tensorflow-gpu conda gets also cudatoolkit and cudnn packages by itself, which once had to be laboriously done manually. In the first step, we switch to the appropriate environment clone, and then install the tensorflow-gpu package

>conda deactivate
>conda activate my_env-gpu
>conda install tensorflow-gpu

And that’s it 🙂 . All you need to do now is make sure that our environment actually “sees” and uses a GPU unit. It’s best to use a Jupyter notebook:

>jupyter notebook
# W notebooku:
import tensorflow as tf
>>> 1.14.0

That’s all for today 🙂 . I hope you will be able to easily build your development environment for machine learning. Good luck in your learning process!

If the post was helpful, like it and share it with people who might be interested – thank you.

Looking for more reading? Check my other posts: