🐳

Airflowの公式Docker・Docker Compose

2021/08/15に公開

そもそも公式Docker・Docker Compose

Airflow公式が提供しているDockerイメージと、Docker Compose(と関連スクリプト)があり、Airflow(と関連コンポーネント)をコンテナで起動することが出来ます。
(後述しますが、公式以外のイメージもあります)

Docker・Docker Composeにより、

  • Airflow本体(Webserver、Scheduler、Worker)
  • Flower
  • PostgreSQL(Airflowのメタデータデータベース)
  • Redis(Airflowのキュー)

が起動されます。

使ってみる

インストール

公式ページに手順が記載されているので、それに従うだけです。
サービスが使うポート(WebServerが8080ポート、PostgreSQLが5437、Redisが6379)を、他のコンテナ・ローカルのプロセスが使っていないかだけ注意しましょう。

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.1.2/docker-compose.yaml'
mkdir logs dags plugins
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
docker-compose up airflow-init

DAGの配置・実行

ローカルのdags/ディレクトリに置くと、コンテナのAirflowに反映されます。
(なお、いくつか既存に設定されているDAGがあるので、チュートリアルのDAG等で試す時は注意してください)。

ログはlogs/ディレクトリに保存されます(Airflow UI、CLIからも見れます)。

cat  logs/container_tutorial/print_date/2021-08-15T00\:51\:28.845692+00\:00/1.log
[2021-08-15 00:51:35,348] {taskinstance.py:896} INFO - Dependencies all met for <TaskInstance: container_tutorial.print_date 2021-08-15T00:51:28.845692+00:00 [queued]>
[2021-08-15 00:51:35,379] {taskinstance.py:896} INFO - Dependencies all met for <TaskInstance: container_tutorial.print_date 2021-08-15T00:51:28.845692+00:00 [queued]>
[2021-08-15 00:51:35,382] {taskinstance.py:1087} INFO -
--------------------------------------------------------------------------------
[2021-08-15 00:51:35,384] {taskinstance.py:1088} INFO - Starting attempt 1 of 2
[2021-08-15 00:51:35,389] {taskinstance.py:1089} INFO -
--------------------------------------------------------------------------------
[2021-08-15 00:51:35,433] {taskinstance.py:1107} INFO - Executing <Task(BashOperator): print_date> on 2021-08-15T00:51:28.845692+00:00
[2021-08-15 00:51:35,467] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'container_tutorial', 'print_date', '2021-08-15T00:51:28.845692+00:00', '--job-id', '4', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/container_tutorial.py', '--cfg-path', '/tmp/tmpnwsppqa7', '--error-file', '/tmp/tmpsrwekq1d']
[2021-08-15 00:51:35,446] {standard_task_runner.py:52} INFO - Started process 2628 to run task
[2021-08-15 00:51:35,470] {standard_task_runner.py:77} INFO - Job 4: Subtask print_date
[2021-08-15 00:51:35,616] {logging_mixin.py:104} INFO - Running <TaskInstance: container_tutorial.print_date 2021-08-15T00:51:28.845692+00:00 [running]> on host aef7889d614a
[2021-08-15 00:51:36,125] {taskinstance.py:1302} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_EMAIL=***@example.com
AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=container_tutorial
AIRFLOW_CTX_TASK_ID=print_date
AIRFLOW_CTX_EXECUTION_DATE=2021-08-15T00:51:28.845692+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-08-15T00:51:28.845692+00:00
[2021-08-15 00:51:36,129] {subprocess.py:52} INFO - Tmp dir root location:
 /tmp
[2021-08-15 00:51:36,131] {subprocess.py:63} INFO - Running command: ['bash', '-c', 'date']
[2021-08-15 00:51:36,154] {subprocess.py:74} INFO - Output:
[2021-08-15 00:51:36,160] {subprocess.py:78} INFO - Sun Aug 15 00:51:36 UTC 2021
[2021-08-15 00:51:36,165] {subprocess.py:82} INFO - Command exited with return code 0
[2021-08-15 00:51:36,242] {taskinstance.py:1211} INFO - Marking task as SUCCESS. dag_id=container_tutorial, task_id=print_date, execution_date=20210815T005128, start_date=20210815T005135, end_date=20210815T005136
[2021-08-15 00:51:36,317] {taskinstance.py:1265} INFO - 2 downstream tasks scheduled from follow-on schedule check
[2021-08-15 00:51:36,340] {local_task_job.py:149} INFO - Task exited with return code 0

中に入る

WebUI

ローカルのブラウザでhttp://localhost:8080を開くと、みんな大好きAirflow UIにアクセス出来ます。ユーザ名・パスワードともにairflowです。

CLI

ラッパーコマンドをインストールしませう

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.1.2/airflow.sh'
chmod +x airflow.sh

Workerでbashの実行

./airflow.sh bash

Airflow CLIの実行

./airflow.sh dags list
Creating airflow_airflow-worker_run ... done
dag_id                                  | filepath                                                                                                         | owner   | paused
========================================+==================================================================================================================+=========+=======
container_tutorial                      | container_tutorial.py                                                                                            | airflow | None
example_bash_operator                   | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_bash_operator.py                   | airflow | True
example_branch_datetime_operator_2      | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_branch_datetime_operator.py        | airflow | True
example_branch_dop_operator_v3          | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_branch_python_dop_operator_3.py    | airflow | True
example_branch_labels                   | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_branch_labels.py                   | airflow | True
example_branch_operator                 | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_branch_operator.py                 | airflow | True
example_complex                         | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_complex.py                         | airflow | True
example_dag_decorator                   | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_dag_decorator.py                   | airflow | True
example_external_task_marker_child      | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_external_task_marker_dag.py        | airflow | True
example_external_task_marker_parent     | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_external_task_marker_dag.py        | airflow | True
example_kubernetes_executor             | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_kubernetes_executor.py             | airflow | True
example_kubernetes_executor_config      | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_kubernetes_executor_config.py      | airflow | True
example_nested_branch_dag               | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_nested_branch_dag.py               | airflow | True
example_passing_params_via_test_command | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_passing_params_via_test_command.py | airflow | True
example_python_operator                 | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_python_operator.py                 | airflow | True
example_short_circuit_operator          | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_short_circuit_operator.py          | airflow | True
example_skip_dag                        | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_skip_dag.py                        | airflow | True
example_subdag_operator                 | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_subdag_operator.py                 | airflow | True
example_subdag_operator.section-1       | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_subdag_operator.py                 | airflow | True
example_subdag_operator.section-2       | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_subdag_operator.py                 | airflow | True
example_task_group                      | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_task_group.py                      | airflow | True
example_task_group_decorator            | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_task_group_decorator.py            | airflow | True
example_trigger_controller_dag          | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_trigger_controller_dag.py          | airflow | True
example_trigger_target_dag              | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_trigger_target_dag.py              | airflow | True
example_weekday_branch_operator         | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_branch_day_of_week_operator.py     | airflow | True
example_xcom                            | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_xcom.py                            | airflow | True
example_xcom_args                       | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_xcomargs.py                        | airflow | True
example_xcom_args_with_operators        | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_xcomargs.py                        | airflow | True
latest_only                             | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_latest_only.py                     | airflow | True
latest_only_with_trigger                | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_latest_only_with_trigger.py        | airflow | True
test_utils                              | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/test_utils.py                              | airflow | True
tutorial                                | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/tutorial.py                                | airflow | True
tutorial_etl_dag                        | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/tutorial_etl_dag.py                        | airflow | True
tutorial_taskflow_api_etl               | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/tutorial_taskflow_api_etl.py               | airflow | True
tutorial_taskflow_api_etl_virtualenv    | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/tutorial_taskflow_api_etl_virtualenv.py    | airflow | True

落ち葉拾い

パッケージのインストール

Linux・Pythonのパッケージを、どうインストールするのかなーと思ったら、Stackoverflowにジャストな解答がありました。

  • docker-compose.ymlの同じディレクトリにDockerfileを配置
  • docker-compose.ymlのimageをbuldに変更(下)

もともと

  image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.2}

変更

  build: .

公式以外のDockerイメージ・ツールとの違い

TODO Whirlやその他との比較を書く。

Discussion