✨
AWS Glueの開発環境を整える
以下の記事を参考にしながら少し改良して動かすようにしました。
version: '3.5'
services:
localstack:
container_name: glue-docker-sample-localstack
image: localstack/localstack:0.12.8
environment:
- SERVICES=s3
- AWS_DEFAULT_REGION=ap-northeast-1
- AWS_DEFAULT_OUTPUT=json
- AWS_ACCESS_KEY_ID=test
- AWS_SECRET_ACCESS_KEY=test
networks:
- glue-network
glue:
container_name: glue-docker-sample-glue
image: amazon/aws-glue-libs:glue_libs_3.0.0_image_01
volumes:
- ./:/home/glue_user/workspace/jupyter_workspace
- ./spark.conf:/home/glue_user/spark/conf/spark-defaults.conf
environment:
- DISABLE_SSL=true
- AWS_REGION=ap-northeast-1
- AWS_OUTPUT=json
- AWS_ACCESS_KEY_ID=test
- AWS_SECRET_ACCESS_KEY=test
ports:
- 8888:8888
- 4040:4040
networks:
- glue-network
command: /home/glue_user/jupyter/jupyter_start.sh
networks:
glue-network:
name: glue-network
Makefile でコマンドをまとめています
up:
docker compose up -d
build:
docker compose build --no-cache --force-rm
remake:
@make destroy
@make init
stop:
docker compose stop
down:
docker compose down --remove-orphans
restart:
@make down
@make up
destroy:
docker compose down --rmi all --volumes --remove-orphans
destroy-volumes:
docker compose down --volumes --remove-orphans
ps:
docker compose ps
logs:
docker compose logs
localstack:
docker compose exec localstack bash
glue:
docker compose exec glue bash
jupyter:
open http://127.0.0.1:8888/lab
- GitHub リポジトリ
動かしてみる
- コンテナを起動
make up
- localstack へのファイル追加
make glue
cd jupyter_workspace
aws s3 mb s3://awsglue-datasets --endpoint-url http://localstack:4566
aws s3 cp ./persons.json s3://awsglue-datasets/examples/us-legislators/all/ --endpoint-url http://localstack:4566
- pytest を実施
$ python3 -m pytest
================================================================================================ test session starts ================================================================================================
platform linux -- Python 3.7.10, pytest-6.2.3, py-1.11.0, pluggy-0.13.1
rootdir: /home/glue_user/workspace/jupyter_workspace
plugins: anyio-3.6.1
collected 1 item
tests/test_sample.py . [100%]
================================================================================================= warnings summary ==================================================================================================
tests/test_sample.py::test_counts
/home/glue_user/spark/python/pyspark/sql/context.py:79: DeprecationWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
DeprecationWarning)
-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================================================================================== 1 passed, 1 warning in 16.83s ===========================================================================================
jupyter を開いてみます。
make jupyter
jupyter の使い方は割愛しますが、Notebook の PySpark をクリックするとインタラクティブに PySpark を試せます。
Discussion