How to install Airflow 2 on Synology NAS

Vincent Levorato
8 min readJan 10, 2021
Photo by Alex Cheung on Unsplash

Apache Airflow 2.0 release has been launched on December 17th 2020, and I wanted to try this new version to make some tests, but thinking about it, if I wanted to go to the end of the “process” and test it over the long term, I had to deploy it on a server that could schedule jobs, mainly data science jobs: as a matter of fact, one of my profiles is to be a data scientist. Creating virtual machines in the Cloud is easy, but it is not priceless. In my case, I’d like to setup an environment up all time, testing the Airflow release with small data science projects, so the on-premise solution might be more interesting. Problem: I don’t have any machine that could be used as a server… unless if I can use my new DS220+ Synology NAS to fulfill that goal (no embedded marketing here), replacing my old DS212j.

DiskStation Manager

The OS distribution installed on a Synology NAS is DSM (DiskStation Manager), a custom Linux distribution, very lite, with specific command tools. For instance, starting/stopping a package is done via the synopkg command.

Disclaimer: the best practice to isolate your environment should be to use containers (and Docker is available in DSM). I didn’t choose this option as it involves more time to setup (multiple use of docker build command before having a “fine” image can be long…). But I think it will be my next achievement to have a clean management of Airflow with my NAS.

Disclaimer-bis: this article considers the reader to have basic knowledge about Apache Airflow. For more information, go to https://airflow.apache.org.

Steps

Here are the different steps to have a nice Airflow 2 installation running on your NAS:

  1. Create a dedicated user with “user home service” and activate SSH service
  2. Install Python3 Synology package
  3. Install Miniconda distribution, create conda env + install Airflow 2
  4. Install MariaDB Synology package + configure DB
  5. Configure Airflow
  6. Register Airflow services into DSM services

Create a dedicated user

First, you’ll need to activate user home service to create a user folder where you land when connecting through SSH, by going to the Control Panel → User, Advanced tab and checking Enable user home service at the bottom of the menu.

Enable user home service

Then Create a new user.

Create a new user in DSM

During creation, set User Groups to users AND administrators (mandatory to connect via SSH).

Enable SSH service

Open Control Panel → Terminal & SNMP and check Enable SSH service.

Enable SSH service

Install Python 3

Go to Package Center, find the Python 3 package and just install it.

Package Center with Python 3 installed

To check of all is ok, you should try to connect to your NAS through SSH, and check Python version.

ssh user@synology_nasuser@synology_nas:~$ python3Python 3.8.2 (tags/Contacts-1.0.0-0232-200617:57e5f51, Jun 29 2020, 09:34:04)
[GCC 4.9.3 20150311 (prerelease)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Install Miniconda distribution

Miniconda is a very lite installation of conda tools, packaged with Python only and some standard packages. Connect via SSH to the NAS and download the latest Miniconda distribution:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Install it, follow instructions, and accept conda init command at the end of the installation, it will modify your .bashrc file to setup conda on login:

./Miniconda3-latest-Linux-x86_64.shWelcome to Miniconda3 py38_4.9.2In order to continue the installation process, please review the license agreement.Please, press ENTER to continue>>>...
INSTALL PART
...
installation finished.Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no][no] >>> yes

Log out and log in, and you should have a new prompt with base conda environment set.

(base) user@synology_nas:~$

When you launch Python, it will be substituted by the Anaconda version installed with Miniconda:

(base) user@synology_nas:~$ pythonPython 3.8.5 (default, Sep  4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Create an isolated conda environment named airflow for instance, activate it and install airflow inside this environment:

(base) user@synology_nas:~$ conda create -n airflow python=3.8
...
(base) user@synology_nas:~$ conda activate airflow
(airflow) user@synology_nas:~$ pip install apache-airflow==2.0

To create the default folder for Airflow, initialize the installation by running the command:

airflow db init

That will create a folder named airflow where lies different files:

  • airflow.cfg → Airflow configuration that we’ll tweak later
  • airflow.db → SQLite DB created by default (we won’t use it)
  • logs/ → will contains all logs about job executions (scheduler, DAGs)
  • unittests.cfg → to use the unit test mode
  • webserver_config.py → Flask App config for airflow webserver

Let’s now configure the database service to set a DB dedicated to Airflow (which is preferred to SQLite).

Install MariaDB

Like other packages, you need to install the MariaDB package through the DSM Package Center. Choose the more recent version (v10 in my case).

Install MariaDB 10 package if not already

Open the package and set the root password to connect to the database.

Then connect to the database by launching the client:

(base) user@synology_nas:~$ cd /usr/local/mariadb10/bin/
(base) user@synology_nas:~$ ./mysql -u root -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 362184
Server version: 10.3.21-MariaDB Source distribution
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.MariaDB [(none)]>

You’ll need to create user and database for Airflow metadata. Note that the MariaDB password policy requirements need at least 10 characters, with mixed case, numeric and special characters.

CREATE DATABASE airflow;
CREATE USER airflow IDENTIFIED BY 'RFlowPa$$w0rd';
GRANT ALL PRIVILEGES ON airflow.* TO 'airflow';

Configure Airflow

Go edit the airflow.cfg in ~/airflow folder to change the executor type and setup the MariaDB parameters. In the [core] section, find these three lines:

executor = SequentialExecutor
sql_alchemy_conn = sqlite:////var/services/homes/user/airflow/airflow.db
load_examples = True

There are different kinds of executors (check documentation for further information), the SequentialExecutor being used only for local test, as its explicit name gives the information that it is only able to launch one task at a time. Change that value to LocalExecutor to use multi-processing (even if your NAS has only 2 CPUs). Set also information for the database, as it is written with a SQLAlchemy syntax. Feel free to deactivate or not examples (in my case, I switch them off). Here are the modifications:

executor = LocalExecutor
sql_alchemy_conn = mysql+pymysql://airflow:RFlowPa$$w0rd@localhost:3307/airflow
load_examples = False

I changed also some specific airflow configuration to avoid heavy charge on the NAS, as mine only has 2 CPUs.

[core]
parallelism = 4
dag_concurrency = 2
max_active_runs_per_dag = 1
[webserver]
workers = 1
[scheduler]
min_file_process_interval = 5
catchup_by_default = False

Some explanations (even if the comments inside the config file are pretty clear):
parallelism = 4Maximum amount of tasks running simultaneously on ALL the Airflow installation (I only have 2 CPUs, but I prefer to set parallelism to 4 to speed up de-queuing)
dag_concurrency = 2Maximum amount of tasks running simultaneously by workflow (DAG)
max_active_runs_per_dag = 1For a given workflow, amount of simultaneous running instances of this workflow (DAG)

workers = 1→ amount of workers for the Gunicorn web server

min_file_process_interval = 5 → How many seconds airflow scheduler check on filesystem for workflows (DAGs)
catchup_by_default = False → if set to True, this mechanism is meant to catch up all workflow instances that should have run in the past, until catching the last one, according to the schedule of the workflow.

Once done, activate your conda airflow environment, install PyMySQL package, and re-launch the airflow init db command:

pip install PyMySQL
airflow db init
DB: mysql+pymysql://airflow:***@localhost:3307/airflow
[2021-01-08 22:32:47,702] {db.py:678} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
...[2021-01-08 22:33:18,235] {manager.py:727} WARNING - No user yet created, use flask fab command to do it....Initialization done

As displayed during the installation, Airflow comes now as a real Flask app, which need to create a user to connect to Airflow UI for security purpose.

airflow users create -e john.doe@nowhere.com -f John -l Doe -p meetJ0hnd0E -u johndoe -r Admin

You can test right now if all is correctly set by launching the command:

airflow webserver

You won’t be able to run workflows, but you can test if the UI is up by going to url :

http://synology_nas:8080
Airflow login splash screen

Enter your credentials, et voilà! If you kept examples in the airflow configuration, you should have something as below:

Airflow UI

Stop the airflow web server as we’ll register both core airflow services into NAS services: webserver and scheduler.

Register Airflow services

Synology NAS services are defined in the path /etc/init. You need to create two conf files, one per Airflow service. Create a file using sudo command named airflow-webserver.conf (with vim for instance) as below:

description "start airflow webserver"
author "VLE"
console logrespawn
respawn limit 5 10
script
export PATH=/var/services/homes/user/miniconda3/envs/airflow/bin:${PATH}
export PYTHONPATH=/var/services/homes/user/miniconda3/envs/airflow/bin
export AIRFLOW_HOME=/var/services/homes/user/airflow
exec /var/services/homes/user/miniconda3/envs/airflow/bin/airflow webserver >> /var/log/airflow-webserver.log 2>&1
end script

Create a second conf file named airflow-scheduler.conf, with the same configuration but changing all webserver keyword by scheduler, which corresponds to 2 lines (description and exec command).

You can now launch the two services:

sudo start airflow-scheduler
sudo start airflow-webserver

Go back to your NAS url http://synology_nas:8080 and you should see the Airflow web UI. If something got wrong, you can check the log to investigate.

To run your own projects, I suggest to install the Git Server Synology package which installs the git command. If you do so, don’t forget to add specific paths (if needed) to airflow services, like adding the path of your code project to $PYTHONPATH env variable for instance.

Conclusion

You have now a nice Airflow 2 installation running on your NAS! At the time of writing this article, I got a CPU usage drawback with the airflow webserver: it takes about 50% CPU ressource of my NAS… It can be shutdown if it takes too much ressource, letting the scheduler running its jobs, and being restarted when needed.

Airflow web server process CPU usage

Apart from this, everything works perfectly, and I’m now able to test some tiny projects with this stack, which will be the subject of a future article. Feel free to comment and give your feedback about the usage of Airflow on your Synology NAS.

--

--

Vincent Levorato

Lead Data Scientist @ Prisma Media. Freelance consultant in data science and AI architectures. Computer science PhD. https://www.linkedin.com/in/vlevorato/