If you are a Python programmer and use the AWS and Anaconda environments, sooner or later you will come across the need to run a Python script as a cron process on Amazon Linux in the Anaconda environment. This shouldn’t be difficult, right? Hmmm, unfortunately it is. Because I spent some time configuring the cron on Amazon Linux EC2, so that it would use the Anaconda virtual environment, and it wasn’t trivial, I would like to share with you an idea of how it can be done simply and quickly.
There is, of course, quite a lot of content on the web regarding the configuration of Python cron processes. Some even apply to EC2 and Amazon Linux, but somehow none of these posts completely solved all the problems I encountered.
OK, let’s define individual steps that are necessary to configure Python cron on AWS:
- We need to have SSH access to Amazon Linux running on AWS EC2 – a trivial task, I will not describe it here.
- Anaconda installed and initiated – installation is also simple. At the end of the process, the installer asks whether to initialize conda – select “yes”. As a result, after logging in via SSH, we always have an active environment (base). This can be annoying for some, but it simplifies many issues if the operating system user is mainly used to execute Python scripts. It should look like this after logging in.
- First, we create a virtual conda environment (let’s name it test-env). Here’s described how to do it with a few simple commands. You can theoretically execute a script in the base environment, but this is strongly discouraged. It’s worth just creating a dedicated environment and importing the modules we need.
- Then we create a Python script (let’s name it test.py) that we want to run in cron. For testing purposes it will display information about which conda environment is active. Wait, display? Cron process? Of course, the cron process does not have a terminal, but its output will then be redirected to a file and there you will be able to view the result of execution for testing purposes.
import os
print(os.environ['CONDA_DEFAULT_ENV'])
- Now the most important step, not so obvious, although ultimately trivial. We create an operating system script (let’s name it test.sh), in which we activate the test-env environment and run the test.py script. The first line is crucial for proper operation, the others are quite obvious. It assumes the existence of the default Amazon Linux user (ec2-user) and the installation of Anaconda in the anaconda3 directory. Remember to check and, if necessary, grant unix permissions to run the script.
source /home/ec2-user/anaconda3/bin/activate
conda activate test-env
# If you want to check in the script which environment is active: echo $CONDA_DEFAULT_ENV
python test.py
conda deactivate
- Almost at the end we create a script with the definition of a cron job (Let’s name it test.cron). The following command runs our test.sh (and thus finally test.py) every day at 10 o’clock local server time – note it depends on the data center where we will host the virtual environment. The script execution has the output redirected to the test.log file, where the execution result can be viewed.
00 10 * * * bash test.sh >> /home/ec2-user/test.log 2>&1
- The last task for us is to register the job in the system cron.
crontab test.cron
- The current cron settings can be checked with the command
crontab -l
And that’s all. I invite you to read my other posts and recommend my website – thanks.