👥
SageMaker Studioセットアップ: Terraform活用ガイド
はじめに
阿河です。
機能豊富で無限の可能性を秘める(初見だと少しわかりづらい)SageMaker Studioに圧倒されることなく、Terraformの力を借りて、スムーズかつ堅牢なセキュリティで環境構築を実現します。
目次
- Terraformのセットアップ(in Cloud9)
- SageMaker Studioのデプロイ
- 動作確認
「VPC Only」で、NAT Gateway経由でインターネットに出ることができるノートブックを想定しています。
コードはgithubにも上げています。
1. Terraformのセットアップ(in Cloud9)
Cloud9を立ち上げて、Terraformのセットアップを行います。
$ git clone https://github.com/tfutils/tfenv.git ~/.tfenv
$ sudo ln -s ~/.tfenv/bin/* /usr/local/bin
$ tfenv -v
tfenv 3.0.0-49-g39d8c27
// インストール可能なバージョンをリストする
$ tfenv list-remote
1.8.0-beta1
1.8.0-alpha20240228
1.8.0-alpha20240216
1.8.0-alpha20240214
1.8.0-alpha20240131
1.7.4
・・・(省略)・・・
0.1.0
// Terraformの特定バージョンをインストール
$ tfenv install 1.7.4
Installing Terraform v1.7.4
############################################################################################################################################################################### 100.0%
Installation of terraform v1.7.4 successful. To make this your default version, run 'tfenv use 1.7.4'
// インストールされているバージョンをリストする
$ tfenv list
1.7.4
No default set. Set with 'tfenv use <version>'
// Terraformの使用バージョンを切り替える
$ tfenv use 1.7.4
$ terraform -v
Terraform v1.7.4
on linux_amd64
次にディレクトリ構造を作っていきます。
$ pwd
/home/ec2-user/environment
$ mkdir terraform && cd terraform
以下のツリー構造にしました。
$ tree
.
├── common
│ └── network
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
└── labs
└── labo1
├── main.tf
├── network
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── provider.tf
├── sagemaker
│ ├── studio-env.tf
│ └── variables.tf
├── terraform.tfstate
└── terraform.tfstate.backup
まずmain.tfとprovider.tfを作成します。
/labs/labo1/main.tf
module "network" {
source = "./network"
}
module "sagemaker" {
source = "./sagemaker"
vpc_id = module.network.vpc_id
private_subnet_1a_id = module.network.private_subnet_1a_id
}
/labos/labo1/provider.tf
provider "aws" {
region = "ap-northeast-1"
}
次に土台として、ネットワークを作っていきます。
/common/network/main.tf
#-----------------------------------------------------------------
# VPC
#-----------------------------------------------------------------
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = var.vpc_name
}
}
#-----------------------------------------------------------------
# Public Subnet
#-----------------------------------------------------------------
resource "aws_subnet" "public_subnet_1a" {
vpc_id = aws_vpc.main.id
availability_zone = "ap-northeast-1a"
cidr_block = var.public_subnet_a_cidr
tags = {
Name = var.public_subnet_name_a
}
}
resource "aws_subnet" "public_subnet_1c" {
vpc_id = aws_vpc.main.id
availability_zone = "ap-northeast-1c"
cidr_block = var.public_subnet_c_cidr
tags = {
Name = var.public_subnet_name_c
}
}
#-----------------------------------------------------------------
# Private Subnet
#-----------------------------------------------------------------
resource "aws_subnet" "private_subnet_1a" {
vpc_id = aws_vpc.main.id
availability_zone = "ap-northeast-1a"
cidr_block = var.private_subnet_a_cidr
tags = {
Name = var.private_subnet_name_a
}
}
resource "aws_subnet" "private_subnet_1c" {
vpc_id = aws_vpc.main.id
availability_zone = "ap-northeast-1c"
cidr_block = var.private_subnet_c_cidr
tags = {
Name = var.private_subnet_name_c
}
}
#-----------------------------------------------------------------
# RouteTable
#-----------------------------------------------------------------
resource "aws_route_table" "public_rt" {
vpc_id = aws_vpc.main.id
tags = {
Name = var.public_rt_name
}
}
resource "aws_route_table_association" "public_rt_1a" {
subnet_id = aws_subnet.public_subnet_1a.id
route_table_id = aws_route_table.public_rt.id
}
resource "aws_route_table_association" "public_rt_1c" {
subnet_id = aws_subnet.public_subnet_1c.id
route_table_id = aws_route_table.public_rt.id
}
resource "aws_route_table" "private_rt" {
vpc_id = aws_vpc.main.id
tags = {
Name = var.private_rt_name
}
}
resource "aws_route_table_association" "private_rt_1a" {
subnet_id = aws_subnet.private_subnet_1a.id
route_table_id = aws_route_table.private_rt.id
}
resource "aws_route_table_association" "private_rt_1c" {
subnet_id = aws_subnet.private_subnet_1c.id
route_table_id = aws_route_table.private_rt.id
}
#-----------------------------------------------------------------
# Gateway
#-----------------------------------------------------------------
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.main.id
tags = {
Name = var.igw_name
}
}
resource "aws_route" "public_rt_igw_r" {
route_table_id = aws_route_table.public_rt.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.igw.id
}
resource "aws_eip" "nat_eip_A" {
domain = "vpc"
}
resource "aws_nat_gateway" "ngwA" {
allocation_id = aws_eip.nat_eip_A.id
subnet_id = aws_subnet.public_subnet_1a.id
tags = {
Name = var.ngw_name
}
}
resource "aws_route" "private_rt_nat_a_r" {
route_table_id = aws_route_table.private_rt.id
destination_cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.ngwA.id
}
#-----------------------------------------------------------------
# S3 VPCEndopoint
#-----------------------------------------------------------------
resource "aws_vpc_endpoint" "vpc_endpoint_s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.ap-northeast-1.s3"
vpc_endpoint_type = "Gateway"
tags = {
"Name" = var.vpc_endpoint_name_to_s3
}
}
resource "aws_vpc_endpoint_route_table_association" "private_s3" {
vpc_endpoint_id = aws_vpc_endpoint.vpc_endpoint_s3.id
route_table_id = aws_route_table.private_rt.id
}
/common/network/variables.tf
variable "vpc_name" {
type = string
}
variable "public_subnet_name_a" {
type = string
}
variable "public_subnet_name_c" {
type = string
}
variable "private_subnet_name_a" {
type = string
}
variable "private_subnet_name_c" {
type = string
}
variable "vpc_cidr" {
type = string
}
variable "public_subnet_a_cidr" {
type = string
}
variable "public_subnet_c_cidr" {
type = string
}
variable "private_subnet_a_cidr" {
type = string
}
variable "private_subnet_c_cidr" {
type = string
}
variable "public_rt_name" {
type = string
}
variable "private_rt_name" {
type = string
}
variable "igw_name" {
type = string
}
variable "ngw_name" {
type = string
}
variable "vpc_endpoint_name_to_s3" {
type = string
}
/common/network/outputs.tf
output "vpc_id" {
value = aws_vpc.main.id
}
output "public_subnet_1a" {
value = aws_subnet.public_subnet_1a.id
}
output "public_subnet_1c" {
value = aws_subnet.public_subnet_1c.id
}
output "private_subnet_1a" {
value = aws_subnet.private_subnet_1a.id
}
output "private_subnet_1c" {
value = aws_subnet.private_subnet_1c.id
}
output "igw" {
value = aws_internet_gateway.igw.id
}
output "ngw" {
value = aws_nat_gateway.ngwA.id
}
output "vpc_endpoint_s3" {
value = aws_vpc_endpoint.vpc_endpoint_s3.id
}
labs/labo1/network/main.tf
module "network" {
source = "../../../common/network"
vpc_name = var.vpc_name
public_subnet_name_a = var.public_subnet_name_a
public_subnet_name_c = var.public_subnet_name_c
private_subnet_name_a = var.private_subnet_name_a
private_subnet_name_c = var.private_subnet_name_c
vpc_cidr = var.vpc_cidr
public_subnet_a_cidr = var.public_subnet_a_cidr
public_subnet_c_cidr = var.public_subnet_c_cidr
private_subnet_a_cidr = var.private_subnet_a_cidr
private_subnet_c_cidr = var.private_subnet_c_cidr
public_rt_name = var.public_rt_name
private_rt_name = var.private_rt_name
igw_name = var.igw_name
ngw_name = var.ngw_name
vpc_endpoint_name_to_s3 = var.vpc_endpoint_name_to_s3
}
labs/labo1/network/variables.tf
variable "vpc_name" {
type = string
default = "labo_vpc"
}
variable "public_subnet_name_a" {
type = string
default = "labo_public_subnet_1a"
}
variable "public_subnet_name_c" {
type = string
default = "labo_public_subnet_1c"
}
variable "private_subnet_name_a" {
type = string
default = "labo_private_subnet_1a"
}
variable "private_subnet_name_c" {
type = string
default = "labo_private_subnet_1c"
}
variable "vpc_cidr" {
type = string
default = "10.0.0.0/16"
}
variable "public_subnet_a_cidr" {
type = string
default = "10.0.1.0/24"
}
variable "public_subnet_c_cidr" {
type = string
default = "10.0.2.0/24"
}
variable "private_subnet_a_cidr" {
type = string
default = "10.0.3.0/24"
}
variable "private_subnet_c_cidr" {
type = string
default = "10.0.4.0/24"
}
variable "public_rt_name" {
type = string
default = "labo_public_rt"
}
variable "private_rt_name" {
type = string
default = "labo_private_rt"
}
variable "igw_name" {
type = string
default = "labo_igw"
}
variable "ngw_name" {
type = string
default = "labo_ngw"
}
variable "vpc_endpoint_name_to_s3" {
type = string
default = "labo_vpc_endpoint_to_s3"
}
labs/labo1/network/outputs.tf
output "vpc_id" {
value = module.network.vpc_id
}
output "public_subnet_1a_id" {
value = module.network.public_subnet_1a
}
output "public_subnet_1c_id" {
value = module.network.public_subnet_1c
}
output "private_subnet_1a_id" {
value = module.network.private_subnet_1a
}
output "private_subnet_1c_id" {
value = module.network.private_subnet_1c
}
output "igw_id" {
value = module.network.igw
}
output "ngw_id" {
value = module.network.ngw
}
output "vpc_endpoint_s3_id" {
value = module.network.vpc_endpoint_s3
}
土台となるネットワークの定義が終わりました。
3. SageMaker Studioのデプロイ
SageMaker Studioのセットアップを行います。
labs/labo1/sagemaker/studio-env.tf
#-----------------------------------------------------------------
# Domain
#-----------------------------------------------------------------
resource "aws_sagemaker_domain" "labo_domain" {
domain_name = "labo-sagemaker-studio-domain"
app_network_access_type = "VpcOnly"
auth_mode = "IAM"
vpc_id = var.vpc_id
subnet_ids = [var.private_subnet_1a_id]
default_user_settings {
execution_role = aws_iam_role.sagemaker_execution_role.arn
security_groups = [aws_security_group.sagemaker_studio_sg.id]
jupyter_server_app_settings {
default_resource_spec {
instance_type = "system"
}
}
}
}
#-----------------------------------------------------------------
# User Profile
#-----------------------------------------------------------------
resource "aws_sagemaker_user_profile" "labo_user_profile" {
domain_id = aws_sagemaker_domain.labo_domain.id
user_profile_name = "labo-sagemaker-user"
}
#-----------------------------------------------------------------
# SageMaker Role
#-----------------------------------------------------------------
resource "aws_iam_role" "sagemaker_execution_role" {
name = "sagemaker_execution_role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "sagemaker.amazonaws.com"
}
},
]
})
}
resource "aws_iam_policy_attachment" "sagemaker_full_access" {
name = "sagemaker_full_access"
roles = [aws_iam_role.sagemaker_execution_role.name]
policy_arn = "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
}
#-----------------------------------------------------------------
# SageMaker Security Group
#-----------------------------------------------------------------
resource "aws_security_group" "sagemaker_studio_sg" {
name = "labo-sagemaker-studio-sg"
description = "Security group for SageMaker Studio in labo environment"
vpc_id = var.vpc_id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "labo-sagemaker-studio-sg"
}
}
labs/labo1/sagemaker/variables.tf
variable "vpc_id" {}
variable "private_subnet_1a_id" {}
編集が終わったら、デプロイを行います。
$ cd /home/ec2-user/environment/terraform/labs/labo1/
$ terraform init
$ terraform plan
$ terraform apply
ネットワークリソース(VPC/サブネット/ルートテーブル/インターネットゲートウェイ/NAT Gateway/S3ゲートウェイエンドポイント)がデプロイされたことを確認します。
エラーなく、デプロイできたら次のステップに進んでください。
4. 動作確認
マネジメントコンソールで「SageMaker」のページに移動します。
サイドバーから、「Studio」を選択。
ドメインとユーザープロファイルの登録がされているので、ログインを行ってみます。
ログインはできているようなので、JupyterLabスペースを作成します。
特に設定は変更せず、「Run Space」を実行します。
問題なく利用ができそうです。
Studio UI上もインスタンス起動の確認ができています。
おわりに
以上、自動セットアップが完了しました。
誰かの参考になれば幸いです。
Discussion