🐳
Airflowの公式Docker・Docker Compose
そもそも公式Docker・Docker Compose
Airflow公式が提供しているDockerイメージと、Docker Compose(と関連スクリプト)があり、Airflow(と関連コンポーネント)をコンテナで起動することが出来ます。
(後述しますが、公式以外のイメージもあります)
Docker・Docker Composeにより、
- Airflow本体(Webserver、Scheduler、Worker)
- Flower
- PostgreSQL(Airflowのメタデータデータベース)
- Redis(Airflowのキュー)
が起動されます。
使ってみる
インストール
公式ページに手順が記載されているので、それに従うだけです。
サービスが使うポート(WebServerが8080ポート、PostgreSQLが5437、Redisが6379)を、他のコンテナ・ローカルのプロセスが使っていないかだけ注意しましょう。
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.1.2/docker-compose.yaml'
mkdir logs dags plugins
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
docker-compose up airflow-init
DAGの配置・実行
ローカルのdags/ディレクトリに置くと、コンテナのAirflowに反映されます。
(なお、いくつか既存に設定されているDAGがあるので、チュートリアルのDAG等で試す時は注意してください)。
ログはlogs/ディレクトリに保存されます(Airflow UI、CLIからも見れます)。
cat logs/container_tutorial/print_date/2021-08-15T00\:51\:28.845692+00\:00/1.log
[2021-08-15 00:51:35,348] {taskinstance.py:896} INFO - Dependencies all met for <TaskInstance: container_tutorial.print_date 2021-08-15T00:51:28.845692+00:00 [queued]>
[2021-08-15 00:51:35,379] {taskinstance.py:896} INFO - Dependencies all met for <TaskInstance: container_tutorial.print_date 2021-08-15T00:51:28.845692+00:00 [queued]>
[2021-08-15 00:51:35,382] {taskinstance.py:1087} INFO -
--------------------------------------------------------------------------------
[2021-08-15 00:51:35,384] {taskinstance.py:1088} INFO - Starting attempt 1 of 2
[2021-08-15 00:51:35,389] {taskinstance.py:1089} INFO -
--------------------------------------------------------------------------------
[2021-08-15 00:51:35,433] {taskinstance.py:1107} INFO - Executing <Task(BashOperator): print_date> on 2021-08-15T00:51:28.845692+00:00
[2021-08-15 00:51:35,467] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'container_tutorial', 'print_date', '2021-08-15T00:51:28.845692+00:00', '--job-id', '4', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/container_tutorial.py', '--cfg-path', '/tmp/tmpnwsppqa7', '--error-file', '/tmp/tmpsrwekq1d']
[2021-08-15 00:51:35,446] {standard_task_runner.py:52} INFO - Started process 2628 to run task
[2021-08-15 00:51:35,470] {standard_task_runner.py:77} INFO - Job 4: Subtask print_date
[2021-08-15 00:51:35,616] {logging_mixin.py:104} INFO - Running <TaskInstance: container_tutorial.print_date 2021-08-15T00:51:28.845692+00:00 [running]> on host aef7889d614a
[2021-08-15 00:51:36,125] {taskinstance.py:1302} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_EMAIL=***@example.com
AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=container_tutorial
AIRFLOW_CTX_TASK_ID=print_date
AIRFLOW_CTX_EXECUTION_DATE=2021-08-15T00:51:28.845692+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-08-15T00:51:28.845692+00:00
[2021-08-15 00:51:36,129] {subprocess.py:52} INFO - Tmp dir root location:
/tmp
[2021-08-15 00:51:36,131] {subprocess.py:63} INFO - Running command: ['bash', '-c', 'date']
[2021-08-15 00:51:36,154] {subprocess.py:74} INFO - Output:
[2021-08-15 00:51:36,160] {subprocess.py:78} INFO - Sun Aug 15 00:51:36 UTC 2021
[2021-08-15 00:51:36,165] {subprocess.py:82} INFO - Command exited with return code 0
[2021-08-15 00:51:36,242] {taskinstance.py:1211} INFO - Marking task as SUCCESS. dag_id=container_tutorial, task_id=print_date, execution_date=20210815T005128, start_date=20210815T005135, end_date=20210815T005136
[2021-08-15 00:51:36,317] {taskinstance.py:1265} INFO - 2 downstream tasks scheduled from follow-on schedule check
[2021-08-15 00:51:36,340] {local_task_job.py:149} INFO - Task exited with return code 0
中に入る
WebUI
ローカルのブラウザでhttp://localhost:8080を開くと、みんな大好きAirflow UIにアクセス出来ます。ユーザ名・パスワードともにairflowです。
CLI
ラッパーコマンドをインストールしませう
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.1.2/airflow.sh'
chmod +x airflow.sh
Workerでbashの実行
./airflow.sh bash
Airflow CLIの実行
./airflow.sh dags list
Creating airflow_airflow-worker_run ... done
dag_id | filepath | owner | paused
========================================+==================================================================================================================+=========+=======
container_tutorial | container_tutorial.py | airflow | None
example_bash_operator | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_bash_operator.py | airflow | True
example_branch_datetime_operator_2 | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_branch_datetime_operator.py | airflow | True
example_branch_dop_operator_v3 | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_branch_python_dop_operator_3.py | airflow | True
example_branch_labels | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_branch_labels.py | airflow | True
example_branch_operator | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_branch_operator.py | airflow | True
example_complex | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_complex.py | airflow | True
example_dag_decorator | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_dag_decorator.py | airflow | True
example_external_task_marker_child | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_external_task_marker_dag.py | airflow | True
example_external_task_marker_parent | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_external_task_marker_dag.py | airflow | True
example_kubernetes_executor | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_kubernetes_executor.py | airflow | True
example_kubernetes_executor_config | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_kubernetes_executor_config.py | airflow | True
example_nested_branch_dag | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_nested_branch_dag.py | airflow | True
example_passing_params_via_test_command | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_passing_params_via_test_command.py | airflow | True
example_python_operator | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_python_operator.py | airflow | True
example_short_circuit_operator | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_short_circuit_operator.py | airflow | True
example_skip_dag | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_skip_dag.py | airflow | True
example_subdag_operator | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_subdag_operator.py | airflow | True
example_subdag_operator.section-1 | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_subdag_operator.py | airflow | True
example_subdag_operator.section-2 | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_subdag_operator.py | airflow | True
example_task_group | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_task_group.py | airflow | True
example_task_group_decorator | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_task_group_decorator.py | airflow | True
example_trigger_controller_dag | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_trigger_controller_dag.py | airflow | True
example_trigger_target_dag | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_trigger_target_dag.py | airflow | True
example_weekday_branch_operator | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_branch_day_of_week_operator.py | airflow | True
example_xcom | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_xcom.py | airflow | True
example_xcom_args | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_xcomargs.py | airflow | True
example_xcom_args_with_operators | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_xcomargs.py | airflow | True
latest_only | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_latest_only.py | airflow | True
latest_only_with_trigger | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/example_latest_only_with_trigger.py | airflow | True
test_utils | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/test_utils.py | airflow | True
tutorial | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/tutorial.py | airflow | True
tutorial_etl_dag | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/tutorial_etl_dag.py | airflow | True
tutorial_taskflow_api_etl | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/tutorial_taskflow_api_etl.py | airflow | True
tutorial_taskflow_api_etl_virtualenv | /home/airflow/.local/lib/python3.6/site-packages/airflow/example_dags/tutorial_taskflow_api_etl_virtualenv.py | airflow | True
落ち葉拾い
パッケージのインストール
Linux・Pythonのパッケージを、どうインストールするのかなーと思ったら、Stackoverflowにジャストな解答がありました。
- docker-compose.ymlの同じディレクトリにDockerfileを配置
- docker-compose.ymlのimageをbuldに変更(下)
もともと
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.2}
変更
build: .
公式以外のDockerイメージ・ツールとの違い
TODO Whirlやその他との比較を書く。
Discussion