🙆‍♀️

ChatGPTでDDL to TFしよう! ~Snowflake x Terraform x ChatGPT~

2023/04/06に公開

DDLをHCLに変換したい!

最初のモチベーションはSnowflakeで作成するリソース(テーブル、ビュー、ストアドプロシージャなど)をTerraformのHCL形式で宣言的に管理するために、既存のDDLをHCL形式に変換することでした。
現在、このようなDDLをHCL形式に変換するツールは存在していません(こんなIssueもあります)。DDLを手動でHCL形式に変換するのは面倒です。
まず考えられるアプローチとしてはDDLを解析してHCL形式のコードに変換することが考えられますがなかなか大変そうです。
ここで考えました。ChatGPTにお願いできないか。

ChatGPTでDDLをHCLに変換してみる

試しにChatGPTを利用してDDLをTerraformのHCL形式に変換できるか試してみました。

入力したPrompt

SnowflakeのDDL
create or replace table dcr_samp_provider_db.admin.request_log ( party_account varchar(1000), request_id varchar(1000), request_ts timestamp, request variant, query_hash varchar(1000), template_name varchar(1000), epsilon double, sensitivity int, app_instance_id varchar(1000), processed_ts timestamp, approved boolean, error varchar(1000) ) comment = '{"origin":"sf_ps_wls","name":"dcr","version":{"major":5, "minor":5},"attributes":{"component":"dcr",“role”:“provider”}}';
をTerraformのHCLに変換して。Schemaは以下の通りです。
## Schema

### Required

- `column` (Block List, Min: 1) Definitions of a column to create in the table. Minimum one required. (see [below for nested schema](#nestedblock--column))
- `database` (String) The database in which to create the table.
- `name` (String) Specifies the identifier for the table; must be unique for the database and schema in which the table is created.
- `schema` (String) The schema in which to create the table.

### Optional

- `change_tracking` (Boolean) Specifies whether to enable change tracking on the table. Default false.
- `cluster_by` (List of String) A list of one or more table columns/expressions to be used as clustering key(s) for the table
- `comment` (String) Specifies a comment for the table.
- `data_retention_days` (Number, Deprecated) Specifies the retention period for the table so that Time Travel actions (SELECT, CLONE, UNDROP) can be performed on historical data in the table. Default value is 1, if you wish to inherit the parent schema setting then pass in the schema attribute to this argument.
- `primary_key` (Block List, Max: 1, Deprecated) Definitions of primary key constraint to create on table (see [below for nested schema](#nestedblock--primary_key))
- `tag` (Block List, Deprecated) Definitions of a tag to associate with the resource. (see [below for nested schema](#nestedblock--tag))

### Read-Only

- `id` (String) The ID of this resource.
- `owner` (String) Name of the role that owns the table.

<a id="nestedblock--column"></a>
### Nested Schema for `column`

Required:

- `name` (String) Column name
- `type` (String) Column type, e.g. VARIANT

Optional:

- `comment` (String) Column comment
- `default` (Block List, Max: 1) Defines the column default value; note due to limitations of Snowflake's ALTER TABLE ADD/MODIFY COLUMN updates to default will not be applied (see [below for nested schema](#nestedblock--column--default))
- `identity` (Block List, Max: 1) Defines the identity start/step values for a column. **Note** Identity/default are mutually exclusive. (see [below for nested schema](#nestedblock--column--identity))
- `masking_policy` (String) Masking policy to apply on column
- `nullable` (Boolean) Whether this column can contain null values. **Note**: Depending on your Snowflake version, the default value will not suffice if this column is used in a primary key constraint.

<a id="nestedblock--column--default"></a>
### Nested Schema for `column.default`

Optional:

- `constant` (String) The default constant value for the column
- `expression` (String) The default expression value for the column
- `sequence` (String) The default sequence to use for the column


<a id="nestedblock--column--identity"></a>
### Nested Schema for `column.identity`

Optional:

- `start_num` (Number) The number to start incrementing at.
- `step_num` (Number) Step size to increment by.



<a id="nestedblock--primary_key"></a>
### Nested Schema for `primary_key`

Required:

- `keys` (List of String) Columns to use in primary key

Optional:

- `name` (String) Name of constraint


<a id="nestedblock--tag"></a>
### Nested Schema for `tag`

Required:

- `name` (String) Tag name, e.g. department.
- `value` (String) Tag value, e.g. marketing_info.

Optional:

- `database` (String) Name of the database that the tag was created in.
- `schema` (String) Name of the schema that the tag was created in.


 またHCLの例は以下の通りです。

resource "snowflake_schema" "schema" {
  database            = "database"
  name                = "schema"
  data_retention_days = 1
}

resource "snowflake_sequence" "sequence" {
  database = snowflake_schema.schema.database
  schema   = snowflake_schema.schema.name
  name     = "sequence"
}

resource "snowflake_table" "table" {
  database            = snowflake_schema.schema.database
  schema              = snowflake_schema.schema.name
  name                = "table"
  comment             = "A table."
  cluster_by          = ["to_date(DATE)"]
  data_retention_days = snowflake_schema.schema.data_retention_days
  change_tracking     = false

  column {
    name     = "id"
    type     = "int"
    nullable = true

    default {
      sequence = snowflake_sequence.sequence.fully_qualified_name
    }
  }

  column {
    name     = "identity"
    type     = "NUMBER(38,0)"
    nullable = true

    identity {
      start_num = 1
      step_num  = 3
    }
  }

  column {
    name     = "data"
    type     = "text"
    nullable = false
  }

  column {
    name = "DATE"
    type = "TIMESTAMP_NTZ(9)"
  }

  column {
    name    = "extra"
    type    = "VARIANT"
    comment = "extra data"
  }

  primary_key {
    name = "my_key"
    keys = ["data"]
  }
}

ChatGPTの出力

以下は、TerraformのHCLでSnowflakeのDDLを変換した例です。必要に応じて適宜修正してください。

hcl
Copy code
resource "snowflake_table" "request_log" {
  database = "dcr_samp_provider_db"
  schema   = "admin"
  name     = "request_log"
  comment  = "{\"origin\":\"sf_ps_wls\",\"name\":\"dcr\",\"version\":{\"major\":5, \"minor\":5},\"attributes\":{\"component\":\"dcr\",\"role\":\"provider\"}}"

  column {
    name = "party_account"
    type = "VARCHAR(1000)"
  }

  column {
    name = "request_id"
    type = "VARCHAR(1000)"
  }

  column {
    name = "request_ts"
    type = "TIMESTAMP_NTZ"
  }

  column {
    name = "request"
    type = "VARIANT"
  }

  column {
    name = "query_hash"
    type = "VARCHAR(1000)"
  }

  column {
    name = "template_name"
    type = "VARCHAR(1000)"
  }

  column {
    name = "epsilon"
    type = "DOUBLE"
  }

  column {
    name = "sensitivity"
    type = "INT"
  }

  column {
    name = "app_instance_id"
    type = "VARCHAR(1000)"
  }

  column {
    name = "processed_ts"
    type = "TIMESTAMP_NTZ"
  }

  column {
    name = "approved"
    type = "BOOLEAN"
  }

  column {
    name = "error"
    type = "VARCHAR(1000)"
  }
}

素晴らしいですね!これは使えそうです。
DDLをHCLに変換できるツールとしてCLI化してみました。

import click
import re
import requests

from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import (
    HumanMessage,
    SystemMessage
)


system_settings = """以下はsnowflake terraform providerのresourceの一覧です

snowflake_account
snowflake_account_grant
snowflake_account_parameter
snowflake_api_integration
snowflake_database
snowflake_database_grant
snowflake_external_function
snowflake_external_oauth_integration
snowflake_external_table
snowflake_external_table_grant
snowflake_failover_group
snowflake_file_format
snowflake_file_format_grant
snowflake_function
snowflake_function_grant
snowflake_integration_grant
snowflake_managed_account
snowflake_masking_policy
snowflake_masking_policy_grant
snowflake_materialized_view
snowflake_materialized_view_grant
snowflake_network_policy
snowflake_network_policy_attachment
snowflake_notification_integration
snowflake_oauth_integration
snowflake_object_parameter
snowflake_pipe
snowflake_pipe_grant
snowflake_procedure
snowflake_procedure_grant
snowflake_resource_monitor
snowflake_resource_monitor_grant
snowflake_role
snowflake_role_grants
snowflake_role_ownership_grant
snowflake_row_access_policy
snowflake_row_access_policy_grant
snowflake_saml_integration
snowflake_schema
snowflake_schema_grant
snowflake_scim_integration
snowflake_sequence
snowflake_sequence_grant
snowflake_session_parameter
snowflake_share
snowflake_stage
snowflake_stage_grant
snowflake_storage_integration
snowflake_stream
snowflake_stream_grant
snowflake_table
snowflake_table_constraint
snowflake_table_grant
snowflake_tag
snowflake_tag_association
snowflake_tag_grant
snowflake_tag_masking_policy_association
snowflake_task
snowflake_task_grant
snowflake_user
snowflake_user_grant
snowflake_user_ownership_grant
snowflake_user_public_keys
snowflake_view
snowflake_view_grant
snowflake_warehouse
snowflake_warehouse_grant"""

chat = ChatOpenAI(temperature=0)


@click.command()
@click.option('--ddl', prompt='Enter DDL', help='変換したいDDL')
def convert_ddl(ddl):
    ans = chat(
        [
            SystemMessage(content=system_settings),
            HumanMessage(content="""
                            以下のSQLは何のterraform resourceと対応しますか?
                            対応するresourceのみ答えて
                            \"{ddl}\"
                            """.format(ddl=ddl))
        ]
    )

    pattern = r"snowflake_([a-zA-Z_]+)"

    result = re.search(pattern, ans.content)
    if result:
        resource = result.group(1)
    else:
        print("resourceを見つけられませんでした。")
        exit()
    # Webページを取得する
    response = requests.get(
        f"https://raw.githubusercontent.com/Snowflake-Labs/terraform-provider-snowflake/main/docs/resources/{resource}.md")

    example = re.search(r"```terraform(.+})\n```",
                        response.text, flags=re.DOTALL)
    schema = re.search(r"(## Schema.+)## Import",
                       response.text, flags=re.DOTALL)

    if result:
        example = example.group(1)
        schema = schema.group(1)
    else:
        print("exampleを見つけられませんでした。")
        exit()

    template = "{system_settings}"
    system_message_prompt = SystemMessagePromptTemplate.from_template(template)

    human_template = "SnowflakeのDDL\n{ddl}\nをTerraformのHCLに変換して。Schemaは以下の通りです。\n{schema}\n またHCLの例は以下の通りです。\n{example}"
    human_message_prompt = HumanMessagePromptTemplate.from_template(
        human_template)

    # print(human_template.format(ddl=ddl, schema=schema, example=example))

    chat_prompt = ChatPromptTemplate.from_messages(
        [system_message_prompt, human_message_prompt])

    res = chat(chat_prompt.format_prompt(
        system_settings=system_settings,
        ddl=ddl,
        schema=schema,
        example=example
    ).to_messages())

    print("-"*100)
    print(res.content)


if __name__ == '__main__':
    convert_ddl()

動作例

$ python ddl_to_tf.py
Enter DDL: create or replace row access policy dcr_samp_provider_db.shared_schema.data_firewall as (foo varchar) returns boolean -> exists ( select request_id from dcr_samp_provider_db.admin.request_log w where party_account = current_account() and approved = true and query_hash = sha2(current_statement()) );
----------------------------------------------------------------------------------------------------
以下がSnowflakeのDDLから変換されたTerraformのHCLです。

resource "snowflake_row_access_policy" "data_firewall" {
  name     = "data_firewall"
  database = "dcr_samp_provider_db"
  schema   = "shared_schema"
  row_access_expression = "exists(select request_id from dcr_samp_provider_db.admin.request_log w where party_account = current_account() and approved = true and query_hash = sha2(current_statement()))"
  signature = {
    foo = "VARCHAR"
  }
}

注意点として、`signature`のKeyの名前はDDLの引数名と一致する必要があるため、`foo`として定義しています。また、`snowflake_row_access_policy`のリソース名は任意のものを設定できます。この例では`data_firewall`としています。

いろんなDDLを試してみましたが8割方は変換できている印象でした。
正攻法でやる労力を考えるとすごーく楽にできちゃいました。

まとめ

ChatGPTでDDLをHCLに変換できるツールがいとも簡単にできてしまいました。
ChatGPTは神。

Discussion