👥

SageMaker Studioセットアップ: Terraform活用ガイド

2024/03/15に公開

はじめに

阿河です。

機能豊富で無限の可能性を秘める(初見だと少しわかりづらい)SageMaker Studioに圧倒されることなく、Terraformの力を借りて、スムーズかつ堅牢なセキュリティで環境構築を実現します。

目次

  1. Terraformのセットアップ(in Cloud9)
  2. SageMaker Studioのデプロイ
  3. 動作確認

「VPC Only」で、NAT Gateway経由でインターネットに出ることができるノートブックを想定しています。

https://github.com/agamemnon-ai/Terraform-Samples/tree/main/setup-sagemaker-studio

コードはgithubにも上げています。

1. Terraformのセットアップ(in Cloud9)

https://github.com/tfutils/tfenv

Cloud9を立ち上げて、Terraformのセットアップを行います。

$ git clone https://github.com/tfutils/tfenv.git ~/.tfenv
$ sudo ln -s ~/.tfenv/bin/* /usr/local/bin

$ tfenv -v
tfenv 3.0.0-49-g39d8c27

// インストール可能なバージョンをリストする
$ tfenv list-remote
1.8.0-beta1
1.8.0-alpha20240228
1.8.0-alpha20240216
1.8.0-alpha20240214
1.8.0-alpha20240131
1.7.4
・・・(省略)・・・
0.1.0

// Terraformの特定バージョンをインストール
$ tfenv install 1.7.4
Installing Terraform v1.7.4
############################################################################################################################################################################### 100.0%
Installation of terraform v1.7.4 successful. To make this your default version, run 'tfenv use 1.7.4'

// インストールされているバージョンをリストする
$ tfenv list
  1.7.4
No default set. Set with 'tfenv use <version>'

// Terraformの使用バージョンを切り替える
$ tfenv use 1.7.4
$ terraform -v
Terraform v1.7.4
on linux_amd64

次にディレクトリ構造を作っていきます。

$ pwd
/home/ec2-user/environment
$ mkdir terraform && cd terraform

以下のツリー構造にしました。

$ tree
.
├── common
│   └── network
│       ├── main.tf
│       ├── outputs.tf
│       └── variables.tf
└── labs
    └── labo1
        ├── main.tf
        ├── network
        │   ├── main.tf
        │   ├── outputs.tf
        │   └── variables.tf
        ├── provider.tf
        ├── sagemaker
        │   ├── studio-env.tf
        │   └── variables.tf
        ├── terraform.tfstate
        └── terraform.tfstate.backup

まずmain.tfとprovider.tfを作成します。

/labs/labo1/main.tf

module "network" {
  source = "./network"
}

module "sagemaker" {
  source = "./sagemaker"
  vpc_id = module.network.vpc_id
  private_subnet_1a_id = module.network.private_subnet_1a_id
}

/labos/labo1/provider.tf

provider "aws" {
  region  = "ap-northeast-1"
}

次に土台として、ネットワークを作っていきます。

/common/network/main.tf

#-----------------------------------------------------------------
# VPC
#-----------------------------------------------------------------

resource "aws_vpc" "main" {
  cidr_block = var.vpc_cidr
  enable_dns_support = true
  enable_dns_hostnames = true
  tags = {
    Name = var.vpc_name
  }
}

#-----------------------------------------------------------------
# Public Subnet
#-----------------------------------------------------------------

resource "aws_subnet" "public_subnet_1a" {
  vpc_id     = aws_vpc.main.id
  availability_zone       = "ap-northeast-1a"
  cidr_block = var.public_subnet_a_cidr
  tags = {
    Name = var.public_subnet_name_a
  }
}

resource "aws_subnet" "public_subnet_1c" {
  vpc_id     = aws_vpc.main.id
  availability_zone       = "ap-northeast-1c"
  cidr_block = var.public_subnet_c_cidr
  tags = {
    Name = var.public_subnet_name_c
  }
}

#-----------------------------------------------------------------
# Private Subnet
#-----------------------------------------------------------------

resource "aws_subnet" "private_subnet_1a" {
  vpc_id     = aws_vpc.main.id
  availability_zone       = "ap-northeast-1a"
  cidr_block = var.private_subnet_a_cidr
  tags = {
    Name = var.private_subnet_name_a
  }
}

resource "aws_subnet" "private_subnet_1c" {
  vpc_id     = aws_vpc.main.id
  availability_zone       = "ap-northeast-1c"
  cidr_block = var.private_subnet_c_cidr
  tags = {
    Name = var.private_subnet_name_c
  }
}

#-----------------------------------------------------------------
# RouteTable
#-----------------------------------------------------------------

resource "aws_route_table" "public_rt" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name    = var.public_rt_name
  }
}

resource "aws_route_table_association" "public_rt_1a" {
  subnet_id      = aws_subnet.public_subnet_1a.id
  route_table_id = aws_route_table.public_rt.id
}

resource "aws_route_table_association" "public_rt_1c" {
  subnet_id      = aws_subnet.public_subnet_1c.id
  route_table_id = aws_route_table.public_rt.id
}


resource "aws_route_table" "private_rt" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name    = var.private_rt_name
  }
}

resource "aws_route_table_association" "private_rt_1a" {
  subnet_id      = aws_subnet.private_subnet_1a.id
  route_table_id = aws_route_table.private_rt.id
}

resource "aws_route_table_association" "private_rt_1c" {
  subnet_id      = aws_subnet.private_subnet_1c.id
  route_table_id = aws_route_table.private_rt.id
}

#-----------------------------------------------------------------
# Gateway
#-----------------------------------------------------------------

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name    = var.igw_name
  }
}

resource "aws_route" "public_rt_igw_r" {
  route_table_id         = aws_route_table.public_rt.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.igw.id
}

resource "aws_eip" "nat_eip_A" {
  domain = "vpc"
}


resource "aws_nat_gateway" "ngwA" {
  allocation_id = aws_eip.nat_eip_A.id
  subnet_id     = aws_subnet.public_subnet_1a.id

  tags = {
    Name    = var.ngw_name
  }
}

resource "aws_route" "private_rt_nat_a_r" {
  route_table_id         = aws_route_table.private_rt.id
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id         = aws_nat_gateway.ngwA.id
}

#-----------------------------------------------------------------
# S3 VPCEndopoint
#-----------------------------------------------------------------

resource "aws_vpc_endpoint" "vpc_endpoint_s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.ap-northeast-1.s3"
  vpc_endpoint_type = "Gateway"

  tags = {
    "Name" = var.vpc_endpoint_name_to_s3
  }
}

resource "aws_vpc_endpoint_route_table_association" "private_s3" {
  vpc_endpoint_id = aws_vpc_endpoint.vpc_endpoint_s3.id
  route_table_id  = aws_route_table.private_rt.id
}

/common/network/variables.tf

variable "vpc_name" {
    type = string
}

variable "public_subnet_name_a" {
    type = string
}

variable "public_subnet_name_c" {
    type = string
}

variable "private_subnet_name_a" {
    type = string
}

variable "private_subnet_name_c" {
    type = string
}

variable "vpc_cidr" {
    type = string
}

variable "public_subnet_a_cidr" {
    type = string
}

variable "public_subnet_c_cidr" {
    type = string
}

variable "private_subnet_a_cidr" {
    type = string
}

variable "private_subnet_c_cidr" {
    type = string
}

variable "public_rt_name" {
    type = string
}

variable "private_rt_name" {
    type = string
}

variable "igw_name" {
    type = string
}

variable "ngw_name" {
    type = string
}

variable "vpc_endpoint_name_to_s3" {
    type = string
}

/common/network/outputs.tf

output "vpc_id" {
  value = aws_vpc.main.id
}

output "public_subnet_1a" {
  value = aws_subnet.public_subnet_1a.id
}

output "public_subnet_1c" {
  value = aws_subnet.public_subnet_1c.id
}

output "private_subnet_1a" {
  value = aws_subnet.private_subnet_1a.id
}

output "private_subnet_1c" {
  value = aws_subnet.private_subnet_1c.id
}

output "igw" {
  value = aws_internet_gateway.igw.id
}

output "ngw" {
  value = aws_nat_gateway.ngwA.id
}

output "vpc_endpoint_s3" {
  value = aws_vpc_endpoint.vpc_endpoint_s3.id
}

labs/labo1/network/main.tf

module "network" {
  source = "../../../common/network"
  
  vpc_name    = var.vpc_name
  
  public_subnet_name_a = var.public_subnet_name_a
  public_subnet_name_c = var.public_subnet_name_c
  private_subnet_name_a = var.private_subnet_name_a
  private_subnet_name_c = var.private_subnet_name_c
  
  vpc_cidr = var.vpc_cidr
  public_subnet_a_cidr = var.public_subnet_a_cidr
  public_subnet_c_cidr = var.public_subnet_c_cidr
  private_subnet_a_cidr = var.private_subnet_a_cidr
  private_subnet_c_cidr = var.private_subnet_c_cidr
  
  public_rt_name = var.public_rt_name
  private_rt_name = var.private_rt_name
  igw_name = var.igw_name
  ngw_name = var.ngw_name
  vpc_endpoint_name_to_s3 = var.vpc_endpoint_name_to_s3
}

labs/labo1/network/variables.tf

variable "vpc_name" {
    type = string
    default = "labo_vpc"
}

variable "public_subnet_name_a" {
    type = string
    default = "labo_public_subnet_1a"
}

variable "public_subnet_name_c" {
    type = string
    default = "labo_public_subnet_1c"
}

variable "private_subnet_name_a" {
    type = string
    default = "labo_private_subnet_1a"
}

variable "private_subnet_name_c" {
    type = string
    default = "labo_private_subnet_1c"
}

variable "vpc_cidr" {
    type = string
    default = "10.0.0.0/16"
}

variable "public_subnet_a_cidr" {
    type = string
    default = "10.0.1.0/24"
}

variable "public_subnet_c_cidr" {
    type = string
    default = "10.0.2.0/24"
}

variable "private_subnet_a_cidr" {
    type = string
    default = "10.0.3.0/24"
}

variable "private_subnet_c_cidr" {
    type = string
    default = "10.0.4.0/24"
}

variable "public_rt_name" {
    type = string
    default = "labo_public_rt"
}

variable "private_rt_name" {
    type = string
    default = "labo_private_rt"
}

variable "igw_name" {
    type = string
    default = "labo_igw"
}

variable "ngw_name" {
    type = string
    default = "labo_ngw"
}

variable "vpc_endpoint_name_to_s3" {
    type = string
    default = "labo_vpc_endpoint_to_s3"
}

labs/labo1/network/outputs.tf

output "vpc_id" {
  value = module.network.vpc_id
}

output "public_subnet_1a_id" {
  value = module.network.public_subnet_1a
}

output "public_subnet_1c_id" {
  value = module.network.public_subnet_1c
}

output "private_subnet_1a_id" {
  value = module.network.private_subnet_1a
}

output "private_subnet_1c_id" {
  value = module.network.private_subnet_1c
}

output "igw_id" {
  value = module.network.igw
}

output "ngw_id" {
  value = module.network.ngw
}

output "vpc_endpoint_s3_id" {
  value = module.network.vpc_endpoint_s3
}

土台となるネットワークの定義が終わりました。

3. SageMaker Studioのデプロイ

SageMaker Studioのセットアップを行います。

labs/labo1/sagemaker/studio-env.tf

#-----------------------------------------------------------------
# Domain
#-----------------------------------------------------------------

resource "aws_sagemaker_domain" "labo_domain" {
  domain_name = "labo-sagemaker-studio-domain"
  app_network_access_type = "VpcOnly"
  auth_mode   = "IAM"
  vpc_id      = var.vpc_id
  subnet_ids  = [var.private_subnet_1a_id]

  default_user_settings {
    execution_role = aws_iam_role.sagemaker_execution_role.arn
    security_groups = [aws_security_group.sagemaker_studio_sg.id]

    jupyter_server_app_settings {
      default_resource_spec {
        instance_type = "system"
      }
    }
  }
}

#-----------------------------------------------------------------
# User Profile
#-----------------------------------------------------------------

resource "aws_sagemaker_user_profile" "labo_user_profile" {
  domain_id      = aws_sagemaker_domain.labo_domain.id
  user_profile_name = "labo-sagemaker-user"
}

#-----------------------------------------------------------------
# SageMaker Role
#-----------------------------------------------------------------

resource "aws_iam_role" "sagemaker_execution_role" {
  name = "sagemaker_execution_role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "sagemaker.amazonaws.com"
        }
      },
    ]
  })
}

resource "aws_iam_policy_attachment" "sagemaker_full_access" {
  name       = "sagemaker_full_access"
  roles      = [aws_iam_role.sagemaker_execution_role.name]
  policy_arn = "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
}

#-----------------------------------------------------------------
# SageMaker Security Group
#-----------------------------------------------------------------

resource "aws_security_group" "sagemaker_studio_sg" {
  name        = "labo-sagemaker-studio-sg"
  description = "Security group for SageMaker Studio in labo environment"
  vpc_id = var.vpc_id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "labo-sagemaker-studio-sg"
  }
}

labs/labo1/sagemaker/variables.tf

variable "vpc_id" {}
variable "private_subnet_1a_id" {}

編集が終わったら、デプロイを行います。

$ cd /home/ec2-user/environment/terraform/labs/labo1/
$ terraform init
$ terraform plan
$ terraform apply

ネットワークリソース(VPC/サブネット/ルートテーブル/インターネットゲートウェイ/NAT Gateway/S3ゲートウェイエンドポイント)がデプロイされたことを確認します。

エラーなく、デプロイできたら次のステップに進んでください。

4. 動作確認

マネジメントコンソールで「SageMaker」のページに移動します。
サイドバーから、「Studio」を選択。

image.png

ドメインとユーザープロファイルの登録がされているので、ログインを行ってみます。

image.png

ログインはできているようなので、JupyterLabスペースを作成します。

image.png

image.png

特に設定は変更せず、「Run Space」を実行します。

image.png

image.png

問題なく利用ができそうです。

image.png

Studio UI上もインスタンス起動の確認ができています。

おわりに

以上、自動セットアップが完了しました。
誰かの参考になれば幸いです。

MEGAZONE株式会社 Tech Blog

Discussion