【AWS】Amazon Personalize + OpenSearch の連携
公式ドキュメントの Getting Started で扱うデータセットを利用し、映画の検索結果をパーソナライズした結果でリランキングしてみます。
前回:https://zenn.dev/pupumaru/articles/c605470be3bc9e
OpenSearch Ingestion で CSV 取り込み
Bulk API で取り込んでも良いのですが、折角なので OpenSearch Ingestion(Data Prepper)を利用して movies.csv を取り込んでおく。
version: "2"
csv-s3-pipeline:
source:
s3:
notification_type: "sqs"
codec:
csv:
compression: none
sqs:
queue_url: "https://sqs.us-west-2.amazonaws.com/123456789012/ingestion-test-queue"
aws:
region: "us-west-2"
sts_role_arn: "arn:aws:iam::123456789012:role/OSSPipelineRole"
sink:
- opensearch:
# Provide an AWS OpenSearch Service domain endpoint
hosts: [ "https://xxxx.us-west-2.es.amazonaws.com" ]
aws:
# Provide a Role ARN with access to the domain. This role should have a trust relationship with osis-pipelines.amazonaws.com
sts_role_arn: "arn:aws:iam::123456789012:role/OSSPipelineRole"
# Provide the region of the domain.
region: "us-west-2"
serverless: false
index: "movies"
Amazon_Personalize_Search_Ranking_Plugin の関連付け
プラグインの設定(検索パイプライン作成)
以下を参考に OpenSearch が Assume できる IAM ロールを作成しておく。
ドキュメントのコードは若干誤りがあった。
import requests
from requests_auth_aws_sigv4 import AWSSigV4
domain_endpoint = 'https://xxx.us-west-2.es.amazonaws.com'
pipeline_name = 'personalize-pipeline'
url = f'{domain_endpoint}/_search/pipeline/{pipeline_name}'
auth = AWSSigV4('es')
headers = {'Content-Type': 'application/json'}
body = {
"description": "A pipeline to apply custom re-ranking from Amazon Personalize",
"response_processors": [
{
"personalized_search_ranking" : {
"campaign_arn" : "arn:aws:personalize:us-west-2:xxx:campaign/getting-started-campaign",
"item_id_field" : "movieId",
"recipe" : "aws-personalized-ranking",
"weight" : "0.3",
"tag" : "personalize-processor",
"iam_role_arn": "arn:aws:iam::xxx:role/OpenSearchPersonalizeRole",
"aws_region": "us-west-2",
"ignore_failure": True # Not true
}
} # Append
]
}
try:
response = requests.put(url, auth=auth, json=body, headers=headers, verify=False)
print(response.text)
except Exception as e:
print(f"Error: {e}")
以下が返ってくれば OK
{"acknowledged":true}
import requests
from requests_auth_aws_sigv4 import AWSSigV4
domain_endpoint = 'https://xxx.us-west-2.es.amazonaws.com'
pipeline_name = 'personalize-pipeline'
index = 'movies'
url = f'{domain_endpoint}/{index}/_settings/'
auth = AWSSigV4('es')
headers = {'Content-Type': 'application/json'}
body = {
"index.search.default_pipeline": f"{pipeline_name}"
}
try:
response = requests.put(url, auth=auth, json=body, headers=headers)
print(response.text)
except Exception as e:
print(f"Error: {e}")
以下が返ってくれば OK
{"acknowledged":true}
OpenSearch Dashboard の DevTools で試してみる。
GET movies/_search
{
"_source" : ["movieId","title", "genres"],
"query": {
"multi_match": {
"query": "Horror",
"fields": ["genres"]
}
}
}
{
"took": 38,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 978,
"relation": "eq"
},
"max_score": 3.0824862,
"hits": [
{
"_index": "movies",
"_id": "vFvlyZABrLcRtTgpUZZE",
"_score": 3.0824862,
"_source": {
"genres": "Horror",
"movieId": "1258",
"title": "Shining, The (1980)"
}
},
{
"_index": "movies",
"_id": "8lvlyZABrLcRtTgpUZZF",
"_score": 3.0824862,
"_source": {
"genres": "Horror",
"movieId": "1322",
"title": "Amityville 1992: It's About Time (1992)"
}
},
{
"_index": "movies",
"_id": "9lvlyZABrLcRtTgpUZZF",
"_score": 3.0824862,
"_source": {
"genres": "Horror",
"movieId": "1326",
"title": "Amityville II: The Possession (1982)"
}
},
{
"_index": "movies",
"_id": "-VvlyZABrLcRtTgpUZZF",
"_score": 3.0824862,
"_source": {
"genres": "Horror",
"movieId": "1329",
"title": "Blood for Dracula (Andy Warhol's Dracula) (1974)"
}
},
{
"_index": "movies",
"_id": "xVvlyZABrLcRtTgpUZdI",
"_score": 3.0824862,
"_source": {
"genres": "Horror",
"movieId": "1623",
"title": "Wishmaster (1997)"
}
},
{
"_index": "movies",
"_id": "qFvlyZABrLcRtTgpUZhK",
"_score": 3.0824862,
"_source": {
"genres": "Horror",
"movieId": "1972",
"title": "Nightmare on Elm Street 5: The Dream Child, A (1989)"
}
},
{
"_index": "movies",
"_id": "tFvlyZABrLcRtTgpUZhK",
"_score": 3.0824862,
"_source": {
"genres": "Horror",
"movieId": "1984",
"title": "Halloween III: Season of the Witch (1982)"
}
},
{
"_index": "movies",
"_id": "uFvlyZABrLcRtTgpUZhK",
"_score": 3.0824862,
"_source": {
"genres": "Horror",
"movieId": "1990",
"title": "Prom Night IV: Deliver Us From Evil (1992)"
}
},
{
"_index": "movies",
"_id": "v1vlyZABrLcRtTgpUZpf",
"_score": 3.0824862,
"_source": {
"genres": "Horror",
"movieId": "2634",
"title": "Mummy, The (1959)"
}
},
{
"_index": "movies",
"_id": "yFvlyZABrLcRtTgpUZpf",
"_score": 3.0824862,
"_source": {
"genres": "Horror",
"movieId": "2652",
"title": "Curse of Frankenstein, The (1957)"
}
}
]
}
}
パイプラインの有無で検索結果変わらなかったのでパイプラインのメトリクス確認
GET /_nodes/stats/search_pipeline
"failed": 19 になっている。
{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "xxx:personalize-test",
"nodes": {
"RK2DqLuHQ06Hf909886-og": {
"timestamp": 1721391344538,
"name": "2c39d39b2a3172376168a87d64e3be08",
"roles": [
"data",
"ingest",
"master",
"remote_cluster_client"
],
"search_pipeline": {
"total_request": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
},
"total_response": {
"count": 19,
"time_in_millis": 8614,
"current": 0,
"failed": 0
},
"pipelines": {
"personalize-pipeline": {
"request": {
"count": 0,
"time_in_millis": 0,
"current": 0,
"failed": 0
},
"response": {
"count": 19,
"time_in_millis": 8614,
"current": 0,
"failed": 0
},
"request_processors": [],
"response_processors": [
{
"personalized_search_ranking:personalize-processor": {
"type": "personalized_search_ranking",
"stats": {
"count": 19,
"time_in_millis": 8606,
"current": 0,
"failed": 19
}
}
}
]
}
}
}
}
}
}
failed の詳細のトレースができない。。
OpenSearch側のエラーログを出力させてもでない。。
CloudTrail Lake で GetPersonalizedRanking を検索したところエラーメッセージとして "This API does not support recipes of type USER_PERSONALIZATION" が出力されていた。
カスタムレシピである Personalized-Ranking のみを使用できます。このレシピについての詳細は、「Personalized-Ranking レシピ」を参照してください。
OpenSearch の検索結果をリランキングするのでそりゃそうですね。
前回作成したものは User-Personalization だったので使えない。。
Personalized-Ranking レシピでキャンペーンを作り直して実行。
{
"took": 115,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 978,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "movies",
"_id": "vFvlyZABrLcRtTgpUZZE",
"_score": 1,
"_source": {
"genres": "Horror",
"movieId": "1258",
"title": "Shining, The (1980)"
}
},
{
"_index": "movies",
"_id": "8lvlyZABrLcRtTgpUZZF",
"_score": 0.55770665,
"_source": {
"genres": "Horror",
"movieId": "1322",
"title": "Amityville 1992: It's About Time (1992)"
}
},
{
"_index": "movies",
"_id": "9lvlyZABrLcRtTgpUZZF",
"_score": 0.5,
"_source": {
"genres": "Horror",
"movieId": "1326",
"title": "Amityville II: The Possession (1982)"
}
},
{
"_index": "movies",
"_id": "qFvlyZABrLcRtTgpUZhK",
"_score": 0.43862396,
"_source": {
"genres": "Horror",
"movieId": "1972",
"title": "Nightmare on Elm Street 5: The Dream Child, A (1989)"
}
},
{
"_index": "movies",
"_id": "-VvlyZABrLcRtTgpUZZF",
"_score": 0.39178258,
"_source": {
"genres": "Horror",
"movieId": "1329",
"title": "Blood for Dracula (Andy Warhol's Dracula) (1974)"
}
},
{
"_index": "movies",
"_id": "xVvlyZABrLcRtTgpUZdI",
"_score": 0.36543643,
"_source": {
"genres": "Horror",
"movieId": "1623",
"title": "Wishmaster (1997)"
}
},
{
"_index": "movies",
"_id": "uFvlyZABrLcRtTgpUZhK",
"_score": 0.3500284,
"_source": {
"genres": "Horror",
"movieId": "1990",
"title": "Prom Night IV: Deliver Us From Evil (1992)"
}
},
{
"_index": "movies",
"_id": "tFvlyZABrLcRtTgpUZhK",
"_score": 0.33333334,
"_source": {
"genres": "Horror",
"movieId": "1984",
"title": "Halloween III: Season of the Witch (1982)"
}
},
{
"_index": "movies",
"_id": "v1vlyZABrLcRtTgpUZpf",
"_score": 0.31758314,
"_source": {
"genres": "Horror",
"movieId": "2634",
"title": "Mummy, The (1959)"
}
},
{
"_index": "movies",
"_id": "yFvlyZABrLcRtTgpUZpf",
"_score": 0.28906482,
"_source": {
"genres": "Horror",
"movieId": "2652",
"title": "Curse of Frankenstein, The (1957)"
}
}
]
},
"profile": {
"shards": []
}
}
実行結果を比較すると順序が入れ替わっているのが分かる。
--- result.json 2024-07-22 12:20:24
+++ result_rerank.json 2024-07-22 12:20:30
@@ -1,5 +1,5 @@
{
- "took": 38,
+ "took": 115,
"timed_out": false,
"_shards": {
"total": 5,
@@ -12,12 +12,12 @@
"value": 978,
"relation": "eq"
},
- "max_score": 3.0824862,
+ "max_score": 1,
"hits": [
{
"_index": "movies",
"_id": "vFvlyZABrLcRtTgpUZZE",
- "_score": 3.0824862,
+ "_score": 1,
"_source": {
"genres": "Horror",
"movieId": "1258",
@@ -27,7 +27,7 @@
{
"_index": "movies",
"_id": "8lvlyZABrLcRtTgpUZZF",
- "_score": 3.0824862,
+ "_score": 0.55770665,
"_source": {
"genres": "Horror",
"movieId": "1322",
@@ -37,7 +37,7 @@
{
"_index": "movies",
"_id": "9lvlyZABrLcRtTgpUZZF",
- "_score": 3.0824862,
+ "_score": 0.5,
"_source": {
"genres": "Horror",
"movieId": "1326",
@@ -46,8 +46,18 @@
},
{
"_index": "movies",
+ "_id": "qFvlyZABrLcRtTgpUZhK",
+ "_score": 0.43862396,
+ "_source": {
+ "genres": "Horror",
+ "movieId": "1972",
+ "title": "Nightmare on Elm Street 5: The Dream Child, A (1989)"
+ }
+ },
+ {
+ "_index": "movies",
"_id": "-VvlyZABrLcRtTgpUZZF",
- "_score": 3.0824862,
+ "_score": 0.39178258,
"_source": {
"genres": "Horror",
"movieId": "1329",
@@ -57,7 +67,7 @@
{
"_index": "movies",
"_id": "xVvlyZABrLcRtTgpUZdI",
- "_score": 3.0824862,
+ "_score": 0.36543643,
"_source": {
"genres": "Horror",
"movieId": "1623",
@@ -66,18 +76,18 @@
},
{
"_index": "movies",
- "_id": "qFvlyZABrLcRtTgpUZhK",
- "_score": 3.0824862,
+ "_id": "uFvlyZABrLcRtTgpUZhK",
+ "_score": 0.3500284,
"_source": {
"genres": "Horror",
- "movieId": "1972",
- "title": "Nightmare on Elm Street 5: The Dream Child, A (1989)"
+ "movieId": "1990",
+ "title": "Prom Night IV: Deliver Us From Evil (1992)"
}
},
{
"_index": "movies",
"_id": "tFvlyZABrLcRtTgpUZhK",
- "_score": 3.0824862,
+ "_score": 0.33333334,
"_source": {
"genres": "Horror",
"movieId": "1984",
@@ -86,18 +96,8 @@
},
{
"_index": "movies",
- "_id": "uFvlyZABrLcRtTgpUZhK",
- "_score": 3.0824862,
- "_source": {
- "genres": "Horror",
- "movieId": "1990",
- "title": "Prom Night IV: Deliver Us From Evil (1992)"
- }
- },
- {
- "_index": "movies",
"_id": "v1vlyZABrLcRtTgpUZpf",
- "_score": 3.0824862,
+ "_score": 0.31758314,
"_source": {
"genres": "Horror",
"movieId": "2634",
@@ -107,7 +107,7 @@
{
"_index": "movies",
"_id": "yFvlyZABrLcRtTgpUZpf",
- "_score": 3.0824862,
+ "_score": 0.28906482,
"_source": {
"genres": "Horror",
"movieId": "2652",
@@ -115,5 +115,8 @@
}
}
]
+ },
+ "profile": {
+ "shards": []
}
}
Discussion