AWS Glueの開発環境を整える

2023/01/10に公開

以下の記事を参考にしながら少し改良して動かすようにしました。

https://docs.aws.amazon.com/ja_jp/glue/latest/dg/aws-glue-programming-etl-libraries.html

https://future-architect.github.io/articles/20220428a/

version: '3.5'
services:
  localstack:
    container_name: glue-docker-sample-localstack
    image: localstack/localstack:0.12.8
    environment:
      - SERVICES=s3
      - AWS_DEFAULT_REGION=ap-northeast-1
      - AWS_DEFAULT_OUTPUT=json
      - AWS_ACCESS_KEY_ID=test
      - AWS_SECRET_ACCESS_KEY=test
    networks:
      - glue-network
  glue:
    container_name: glue-docker-sample-glue
    image: amazon/aws-glue-libs:glue_libs_3.0.0_image_01
    volumes:
      - ./:/home/glue_user/workspace/jupyter_workspace
      - ./spark.conf:/home/glue_user/spark/conf/spark-defaults.conf
    environment:
      - DISABLE_SSL=true
      - AWS_REGION=ap-northeast-1
      - AWS_OUTPUT=json
      - AWS_ACCESS_KEY_ID=test
      - AWS_SECRET_ACCESS_KEY=test
    ports:
      - 8888:8888
      - 4040:4040
    networks:
      - glue-network
    command: /home/glue_user/jupyter/jupyter_start.sh
networks:
  glue-network:
    name: glue-network

Makefile でコマンドをまとめています

up:
	docker compose up -d
build:
	docker compose build --no-cache --force-rm
remake:
	@make destroy
	@make init
stop:
	docker compose stop
down:
	docker compose down --remove-orphans
restart:
	@make down
	@make up
destroy:
	docker compose down --rmi all --volumes --remove-orphans
destroy-volumes:
	docker compose down --volumes --remove-orphans
ps:
	docker compose ps
logs:
	docker compose logs
localstack:
	docker compose exec localstack bash
glue:
	docker compose exec glue bash
jupyter:
	open http://127.0.0.1:8888/lab
  • GitHub リポジトリ

https://github.com/tokku5552/glue-docker-sample

動かしてみる

  • コンテナを起動
make up
  • localstack へのファイル追加
make glue
cd jupyter_workspace
aws s3 mb s3://awsglue-datasets --endpoint-url http://localstack:4566
aws s3 cp ./persons.json  s3://awsglue-datasets/examples/us-legislators/all/ --endpoint-url http://localstack:4566
  • pytest を実施
$ python3 -m pytest
================================================================================================ test session starts ================================================================================================
platform linux -- Python 3.7.10, pytest-6.2.3, py-1.11.0, pluggy-0.13.1
rootdir: /home/glue_user/workspace/jupyter_workspace
plugins: anyio-3.6.1
collected 1 item

tests/test_sample.py .                                                                                                                                                                                        [100%]

================================================================================================= warnings summary ==================================================================================================
tests/test_sample.py::test_counts
  /home/glue_user/spark/python/pyspark/sql/context.py:79: DeprecationWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
    DeprecationWarning)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================================================================================== 1 passed, 1 warning in 16.83s ===========================================================================================

jupyter を開いてみます。

make jupyter

image

jupyter の使い方は割愛しますが、Notebook の PySpark をクリックするとインタラクティブに PySpark を試せます。

Discussion