👏

ひとりMongoDB University / M121 Aggregation Framework(3)

2021/04/17に公開

MongoDB

tech

この記録は、アドベントカレンダー形式ではじめた、MongoDB Universityの学習コースの記録の続きになります！

ただいまのコース

M121: M121: The MongoDB Aggregation Framework
- https://university.mongodb.com/mercury/M121/2021_March_16/overview

このコースでは、Aggregationの深堀りをしていきます。
前回の記事は、ひとりMongoDB University / M121 Aggregation Framework(2) でした。

Chapter 2: Basic Aggregation - Utility Stages

`$addFields` and how it is similar to `$project`

動画にて解説。aggregationで使う $addField について。

$project と似ているけど、出力結果へのフィールドの追加だけじゃなくて、JSONスキーマの入れ子になったフィールドにも追加できる

Ref. https://docs.mongodb.com/manual/reference/operator/aggregation/addFields/

geoNerの場合はどう扱うか？

$geoNear オペレータで、指定の位置に近いデータを抽出。利用に際しパイプラインの最初のステージのみ、2次元の地理空間インデックスが必要、いくつか必須パラメータがある、など。出力には指定の位置からの距離が添えられる。``

デフォルトでは _id フィールドは12バイトのオブジェクトのID（数字）
- ここに関しては、$addField で出力結果を上書きできる。
- たとえば $addFields: { _id : “$item” } とかで結果を確認する場合、_id フィールドの確認が必要
Ref. https://docs.mongodb.com/manual/reference/operator/aggregation/geoNear/?jmp=university


// 一件ためしに取り出し
MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.nycFacilities.findOne()
{
        "_id" : ObjectId("59a57f72ea2da4c51ef35c52"),
        "name" : "Owl Hollow Park",
        "address" : {
                "number" : "",
                "street" : "",
                "city" : "Staten Island",
                "zipcode" : "10312"
        },
        "borough" : "Staten Island",
        "location" : {
                "type" : "Point",
                "coordinates" : [
                        -74.196784,
                        40.561112
                ]
        },
        "domain" : "Parks, Gardens, and Historical Sites",
        "group" : "Parks and Plazas",
        "specialty" : "Parks",
        "type" : "Park"
}

Utility Stages / Cursor-like stages: Part 1

find() でも使えていた、orderやlimit、sort、sampleなどの関数についてです。
まずはサンプルDBに接続します。

// MongoDB Atlasのクラスタから
MongoDB Enterprise Cluster0-shard-0:PRIMARY> show databases
100YWeatherSmall  0.128GB
admin             0.000GB
aggregations      0.068GB
citibike          0.367GB
city              0.002GB
config            0.015GB
coursera-agg      0.083GB
feedback          0.000GB
local             0.710GB
mflix             0.449GB
results           0.000GB
ships             0.001GB
video             0.513GB

// show databases, use databasename あたりはMySQLあたりと一緒ですね！
MongoDB Enterprise Cluster0-shard-0:PRIMARY> use aggregations
switched to db aggregations

// tableではなくてcollectionになるので、一覧表示
MongoDB Enterprise Cluster0-shard-0:PRIMARY> show collections
air_airlines
air_alliances
air_routes
bronze_banking
child_reference
customers
employees
exoplanets
gold_banking
icecream_data
movies
nycFacilities
parent_reference
silver_banking
solarSystem
stocks
system.profile

find()を使う

今回は solarSystem コレクションを使います。
まずはaggregationではなく、find()を使っての列の抽出。

// 件数確認：太陽系の星は9つ
MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.solarSystem.find().count()
9

// ``numberOfMoons`` and ``name`` フィールドを指定して取り出します
MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.solarSystem.find({}, {"_id": 0, "name": 1, "numberOfMoons": 1}).pretty();
{ "name" : "Earth", "numberOfMoons" : 1 }
{ "name" : "Neptune", "numberOfMoons" : 14 }
{ "name" : "Uranus", "numberOfMoons" : 27 }
{ "name" : "Saturn", "numberOfMoons" : 62 }
{ "name" : "Jupiter", "numberOfMoons" : 67 }
{ "name" : "Venus", "numberOfMoons" : 0 }
{ "name" : "Mercury", "numberOfMoons" : 0 }
{ "name" : "Sun", "numberOfMoons" : 0 }
{ "name" : "Mars", "numberOfMoons" : 2 }

// skip(N)で、最初の5件を飛ばします（オーダーは登録順）
MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.solarSystem.find({}, {"_id": 0, "name": 1, "numberOfMoons": 1}).skip(5).pretty()
{ "name" : "Venus", "numberOfMoons" : 0 }
{ "name" : "Mercury", "numberOfMoons" : 0 }
{ "name" : "Sun", "numberOfMoons" : 0 }
{ "name" : "Mars", "numberOfMoons" : 2 }

// limit(N)で、N件のみ抽出（オーダーは登録順）skipで飛ばされたのと同じものが返ります
MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.solarSystem.find({}, {"_id": 0, "name": 1, "numberOfMoons": 1}).limit(5).pretty();

{ "name" : "Earth", "numberOfMoons" : 1 }
{ "name" : "Neptune", "numberOfMoons" : 14 }
{ "name" : "Uranus", "numberOfMoons" : 27 }
{ "name" : "Saturn", "numberOfMoons" : 62 }
{ "name" : "Jupiter", "numberOfMoons" : 67 }

// sortをかけます
// ソートのキーはnumberOfMoons（衛星の数）の降順で
db.solarSystem.find({}, { "_id": 0, "name": 1, "numberOfMoons": 1 }).sort( {"numberOfMoons": -1 } ).pretty();

{ "name" : "Jupiter", "numberOfMoons" : 67 }
{ "name" : "Saturn", "numberOfMoons" : 62 }
{ "name" : "Uranus", "numberOfMoons" : 27 }
{ "name" : "Neptune", "numberOfMoons" : 14 }
{ "name" : "Mars", "numberOfMoons" : 2 }
{ "name" : "Earth", "numberOfMoons" : 1 }
{ "name" : "Venus", "numberOfMoons" : 0 }
{ "name" : "Mercury", "numberOfMoons" : 0 }
{ "name" : "Sun", "numberOfMoons" : 0 }

aggregationのステージで使う

// ``$limit``
db.solarSystem.aggregate([
  {
    "$project": {
      "_id": 0,
      "name": 1,
      "numberOfMoons": 1
    }
  },
  { "$limit": 5  }
]).pretty();

{ "name" : "Earth", "numberOfMoons" : 1 }
{ "name" : "Neptune", "numberOfMoons" : 14 }
{ "name" : "Uranus", "numberOfMoons" : 27 }
{ "name" : "Saturn", "numberOfMoons" : 62 }
{ "name" : "Jupiter", "numberOfMoons" : 67 }

// skip
db.solarSystem.aggregate([
  {
    "$project": {
      "_id": 0,
      "name": 1,
      "numberOfMoons": 1
    }
  },
  {
    "$skip": 1
  }
]).pretty()

// 1つだけスキップ
{ "name" : "Neptune", "numberOfMoons" : 14 }
{ "name" : "Uranus", "numberOfMoons" : 27 }
{ "name" : "Saturn", "numberOfMoons" : 62 }
{ "name" : "Jupiter", "numberOfMoons" : 67 }
{ "name" : "Venus", "numberOfMoons" : 0 }
{ "name" : "Mercury", "numberOfMoons" : 0 }
{ "name" : "Sun", "numberOfMoons" : 0 }
{ "name" : "Mars", "numberOfMoons" : 2 }

// COUNTにフィールド名を指定して出力
// "terrestrial planets": N
db.solarSystem.aggregate([{
  "$match": {
    "type": "Terrestrial planet"
  }
}, {
  "$project": {
    "_id": 0,
    "name": 1,
    "numberOfMoons": 1
  }
}, {
  "$count": "terrestrial planets"
}]).pretty();

{ "terrestrial planets" : 4 }

/*
// 地球型
{ "name" : "Earth", "numberOfMoons" : 1 }
{ "name" : "Venus", "numberOfMoons" : 0 }
{ "name" : "Mercury", "numberOfMoons" : 0 }
{ "name" : "Mars", "numberOfMoons" : 2 }
*/

// sort
// ``$sort`` stage
db.solarSystem.aggregate([{
  "$project": {
    "_id": 0,
    "name": 1,
    "numberOfMoons": 1
  }
}, {
  "$sort": { "numberOfMoons": -1 }
}]).pretty();

{ "name" : "Jupiter", "numberOfMoons" : 67 }
{ "name" : "Saturn", "numberOfMoons" : 62 }
{ "name" : "Uranus", "numberOfMoons" : 27 }
{ "name" : "Neptune", "numberOfMoons" : 14 }
{ "name" : "Mars", "numberOfMoons" : 2 }
{ "name" : "Earth", "numberOfMoons" : 1 }
{ "name" : "Venus", "numberOfMoons" : 0 }
{ "name" : "Mercury", "numberOfMoons" : 0 }
{ "name" : "Sun", "numberOfMoons" : 0 }


// setting ``allowDiskUse`` option
// hasMagneticField -> 磁場があるかないか
db.solarSystem.aggregate([{
  "$project": {
    "_id": 0,
    "name": 1,
    "hasMagneticField": 1,
    "numberOfMoons": 1
  }
}, {
  "$sort": { "hasMagneticField": -1, "numberOfMoons": -1 }
}], { "allowDiskUse": true }).pretty();

// 火星と金星は固有の磁場がない....!
{ "name" : "Jupiter", "numberOfMoons" : 67, "hasMagneticField" : true }
{ "name" : "Saturn", "numberOfMoons" : 62, "hasMagneticField" : true }
{ "name" : "Uranus", "numberOfMoons" : 27, "hasMagneticField" : true }
{ "name" : "Neptune", "numberOfMoons" : 14, "hasMagneticField" : true }
{ "name" : "Earth", "numberOfMoons" : 1, "hasMagneticField" : true }
{ "name" : "Mercury", "numberOfMoons" : 0, "hasMagneticField" : true }
{ "name" : "Sun", "numberOfMoons" : 0, "hasMagneticField" : true }
{ "name" : "Mars", "numberOfMoons" : 2, "hasMagneticField" : false }
{ "name" : "Venus", "numberOfMoons" : 0, "hasMagneticField" : false }

Utility Stages / Cursor-like stages: $sample Stage

$sample は、非常にたくさんのドキュメントがあるコレクションからデータを
無作為に抽出したいときに便利です！

Ref. https://docs.mongodb.com/manual/reference/operator/aggregation/sample/

If all the following conditions are met, $sample uses a pseudo-random cursor to select documents:

$sample is the first stage of the pipeline
N is less than 5% of the total documents in the collection
The collection contains more than 100 documents

以下の条件に当てはまる場合は、疑似的なカーソルでドキュメントを抽出します。

$sample がパイプラインの最初のステージで利用される
Nが全件数の5％より少ない
コレクションには100以上のドキュメントが存在する

上記の条件に1つでも当てはまらない場合は、sortの制約と同じ条件化で、ランダムにデータを抽出します。
MongoDBの $sort は100MBのメモリ内という制約があります。
デフォルトでは、100MBの制限を超えそうなソートを行うと、エラーが出ます！

Ref. https://docs.mongodb.com/manual/reference/operator/aggregation/sort/#std-label-sort-memory-limit
大量のデータをソートする必要がある場合は、この制限を外すために、allowDiskUse オプションを指定してください。

※ aggregationのallowDiskUsageオプション

有効にすると、_tmp サブディレクトリを使います
MongoDB 4.2では、メモリの制限を超えてディスクを利用した際には、詳細ログ、診断ログにその情報が書き出されます

Optional. Enables writing to temporary files. When set to true,
aggregation operations can write data to the _tmp subdirectory in the
dbPath directory. See Perform Large Sort Operation with External Sort
for an example.

Starting in MongoDB 4.2, the profiler log messages and diagnostic log
messages includes a usedDisk indicator if any aggregation stage wrote
data to temporary files due to memory restrictions.

Chapter 2: Basic Aggregation / Lab: Using Cursor-like Stages

練習問題。

Problem

映画の上映で社員さんたちにアンケートを取ってもらいました。
好きな俳優についてのアンケート結果は以下の通り。

favorites = [
  "Sandra Bullock",
  "Tom Hanks",
  "Julia Roberts",
  "Kevin Spacey",
  "George Clooney"]

アメリカ (USA) で公開された映画で、tomatoes.viewer.ratingが3以上のものを取り出し、新しいフィールド（num_favs）として、上記の人気俳優が映画の中に何人出てきたかを表示してください。
それから、num_favsとtomatoes.viewer.rating、タイトルの値をもとにソートしてください。降順で表示してください。
この結果、二十五番目に来る映画のタイトルは何でしょう？

こたえ


// 件数を確認
MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.movies.count()
44488

// タイトルが欲しい
db.movies.aggregate([
  {
    $project: { _id:1, title: 1 }
  },
  { $limit: 2 }
])

// お試し
{ "_id" : ObjectId("573a1390f29313caabcd4cf1"), "title" : "Ingeborg Holm" }
{ "_id" : ObjectId("573a1390f29313caabcd4136"), "title" : "Pauvre Pierrot" }

// tomatoes があるものだけ取り出す
// tomatoes: { $exists: true }
db.movies.aggregate([
  { $match: { tomatoes: { $exists: true } } },
  {
    $project: { _id:1, title: 1, tomatoes: 1 }
  },
  { $limit: 2 }
]).pretty()

// 結果があるものを確認
{
 "_id" : ObjectId("573a1390f29313caabcd421c"),
 "title" : "A Turn of the Century Illusionist",
 "tomatoes" : {
  "viewer" : {
   "rating" : 3.8,
   "numReviews" : 32
  },
  "lastUpdated" : ISODate("2015-08-20T18:46:44Z")
 }
}
{
 "_id" : ObjectId("573a1390f29313caabcd4192"),
 "title" : "The Conjuring of a Woman at the House of Robert Houdin",
 "tomatoes" : {
  "viewer" : {
   "rating" : 3.7,
   "numReviews" : 59
  },
  "lastUpdated" : ISODate("2015-09-11T17:46:29Z")
 }
}

// まずtomatoes がある and tomatoes.viewer.rating >= 3
db.movies.aggregate([
  {
    $match: { $and: [
      { tomatoes: { $exists: true } },
      { "tomatoes.viewer.rating": { $gte: 3 } }
    ]}
  },
  {
    $project: {
      _id:1, title: 1, tomatoes: 1
      }
  },
  { $limit: 5 }
]).pretty()

// 新しいフィールド（num_favs）として、上記の人気俳優が映画の中に何人
// 出てきたかを表示
// $setIntersection で一致する要素を取り出して、数をカウントする

// num_favsとtomatoes.viewer.rating、タイトルの値をもとにソートしてく
// ださい。降順で表示
db.movies.aggregate([
  {
    $match: { $and: [
      { tomatoes: { $exists: true } },
      { "tomatoes.viewer.rating": { $gte: 3 } },
      { cast: { $elemMatch: { $exists: true } } }
    ]}
  },
  {
    $project: {
      _id:1,
      title: 1,
      viewer_rating: "$tomatoes.viewer.rating",
      cast: 1,
      favs: {
        $setIntersection: [ "$cast", favorites ]
      },
    }
  },
  {
    $addFields: {
      num_favs: { $size: "$favs" }
    }
  },
  {
  "$sort": { "num_favs": -1, viewer_rating: -1, "title": -1  }
  },
  { $limit: 3 }
]).pretty()

// 3件サンプリング
{
 "_id" : ObjectId("573a13cbf29313caabd808d2"),
 "title" : "Gravity",
 "cast" : [
  "Sandra Bullock",
  "Orto Ignatiussen",
  "Ed Harris",
  "George Clooney"
 ],
 "viewer_rating" : 4,
 "favs" : [
  "George Clooney",
  "Sandra Bullock"
 ],
 "num_favs" : 2
}
{
 "_id" : ObjectId("573a139af29313caabcf0480"),
 "title" : "A Time to Kill",
 "cast" : [
  "Matthew McConaughey",
  "Sandra Bullock",
  "Samuel L. Jackson",
  "Kevin Spacey"
 ],
 "viewer_rating" : 3.6,
 "favs" : [
  "Kevin Spacey",
  "Sandra Bullock"
 ],
 "num_favs" : 2
}
{
 "_id" : ObjectId("573a13b5f29313caabd447ca"),
 "title" : "Extremely Loud & Incredibly Close",
 "cast" : [
  "Tom Hanks",
  "Thomas Horn",
  "Sandra Bullock",
  "Zoe Caldwell"
 ],
 "viewer_rating" : 3.5,
 "favs" : [
  "Sandra Bullock",
  "Tom Hanks"
 ],
 "num_favs" : 2
}
// num_favsとtomatoes.viewer.rating、タイトルの値をもとにソート


db.movies.aggregate([
  {
    $match: { $and: [
      { tomatoes: { $exists: true } },
      { "tomatoes.viewer.rating": { $gte: 3 } },
      { cast: { $elemMatch: { $exists: true } } }
    ]}
  },
  {
    $project: {
      _id:1,
      title: 1,
      viewer_rating: "$tomatoes.viewer.rating",
      cast: 1,
      favs: {
        $setIntersection: [ "$cast", favorites ]
      },
    }
  },
  {
    $addFields: {
      num_favs: { $size: "$favs" }
    }
  },
  {
  "$sort": { "num_favs": -1, viewer_rating: -1, "title": -1  }
  },
  {
  "$count": "title"
}
])

{ "title" : 27654 }


// What is the title of the 25th film in the aggregation result?
// N番目は？24件をスキップするといいのかな
db.movies.aggregate([
  {
    $match: { $and: [
      { tomatoes: { $exists: true } },
      { "tomatoes.viewer.rating": { $gte: 3 } },
      { cast: { $elemMatch: { $exists: true } } }
    ]}
  },
  {
    $project: {
      _id:1,
      title: 1,
      viewer_rating: "$tomatoes.viewer.rating",
      cast: 1,
      favs: {
        $setIntersection: [ "$cast", favorites ]
      },
    }
  },
  {
    $addFields: {
      num_favs: { $size: "$favs" }
    }
  },
  {
    $sort: { "num_favs": -1, viewer_rating: -1, "title": -1  }
  },
  {
    $skip: 24
  },
  {
    $limit: 2
  }
]).pretty()


{
 "_id" : ObjectId("573a13b2f29313caabd39eef"),
 "title" : "Fantastic Mr. Fox",
 "cast" : [
  "George Clooney",
  "Meryl Streep",
  "Jason Schwartzman",
  "Bill Murray"
 ],
 "viewer_rating" : 3.9,
 "favs" : [
  "George Clooney"
 ],
 "num_favs" : 1
}
{
 "_id" : ObjectId("573a13ddf29313caabdb320f"),
 "title" : "The Heat",
 "cast" : [
  "Sandra Bullock",
  "Melissa McCarthy",
  "Demian Bichir",
  "Marlon Wayans"
 ],
 "viewer_rating" : 3.8,
 "favs" : [
  "Sandra Bullock"
 ],
 "num_favs" : 1
}

// さらに条件：countriesにUSAのもの
// 配列に $in を利用
db.movies.aggregate([
  {
    $match: { $and: [
      { tomatoes: { $exists: true } },
      { "tomatoes.viewer.rating": { $gte: 3 } },
      { cast: { $elemMatch: { $exists: true } } },
      { countries: { $in: [ "USA" ] } }
    ]}
  },
  {
    $project: {
      _id:1,
      title: 1,
      viewer_rating: "$tomatoes.viewer.rating",
      cast: 1,
      favs: {
        $setIntersection: [ "$cast", favorites ]
      },
    }
  },
  {
    $addFields: {
      num_favs: { $size: "$favs" }
    }
  },
  {
    $sort: { "num_favs": -1, viewer_rating: -1, "title": -1  }
  },
  {
    $skip: 24
  },
  {
    $limit: 1
  },
  {
    $project: { _id: 0, title: 1 }
  }
]).pretty()

// これが正解！
{ "title" : "The Heat" }

Chapter 2: Basic Aggregation / Lab: Bringing it all together

練習問題。

Problem

Calculate an average rating for each movie in our collection
where English is an available language,
the minimum imdb.rating is at least 1, the minimum imdb.votes is at
least 1, and it was released in 1990 or after. You'll be required to
rescale (or normalize) imdb.votes. The formula to rescale imdb.votes
and calculate normalized_rating is included as a handout.

What film has the lowest normalized_rating?

Englishのもの
imdb.ratingの最小値が1以上のもの
imdb.votesの最小値が1以上のもの
公開が1990年以降のもの
votesを正規化して、スケーリングを調整してね！

※スケーリングについては、正規化した上で、1から10の範囲におさまるように変換。
（この方式で計算してね！というのがリンクにあるので、それに従って計算）

// general scaling
// 1から10までのスケーリングだったら、以下の通り
// 1 + (1 - 10) (正規化した値)
min + (max - min) ((x - x_min) / (x_max - x_min))

// まずサンプリング：この条件
/*
      { "imdb.rating": { $gte: 1 } },
      { "imdb.votes": { $gte: 1 } },
      { released: { $gte: 1990 } },
*/
db.movies.aggregate([
  {
    $match: { $and: [
      { languages: { $in: [ "English" ] } },
      { "imdb.rating": { $gte: 1 } },
      { "imdb.votes": { $gte: 1 } },
      { released: { $gte: ISODate("1990-01-01") } },
    ]}
  },
  {
    $project: {
      _id:1,
      title: 1,
      "imdb.rating": 1,
      "imdb.votes": 1,
      released: 1
    }
  }
])

/*
以下の数を使ってね！
x_max = 1521105
x_min = 5
min = 1
max = 10
x = "imdb.votes"

x_new = (x - x_min) / (x_max - x_min)

{ $subtract: ["$imdb.votes", 5] } => (x - x_min) に該当
{ $subtract: [1521105, 5] } => (x_max - x_min) に該当

$divideなので、割り算。(x - x_min) / (x_max - x_min) に該当

$divide: [
  { $subtract: ["$imdb.votes", 5] },
  { $subtract: [1521105, 5] }
]

上記で出した正規化した値に対し、9倍して1を足す
正規化した値は0 - 1の範囲なのだけれど、この値を1 - 10 の範囲にスケールしなおすため。
0 -> 1を起点に
1 -> 10までに

1. $projectのステージで計算
*/
db.movies.aggregate([
  {
    $match: { $and: [
      { languages: { $in: [ "English" ] } },
      { "imdb.rating": { $gte: 1 } },
      { "imdb.votes": { $gte: 1 } },
      { released: { $gte: ISODate("1990-01-01") } },
    ]}
  },
  {
    $project: {
      _id:1,
      title: 1,
      normalized_value: {
        $divide: [
          { $subtract: ["$imdb.votes", 5] },
          { $subtract: [1521105, 5] }
        ]
      },
      "normalized_scaled_value": {
        $add: [
          1,
          { $multiply: [
              9,
              { $divide:
                [
                  { $subtract: ["$imdb.votes", 5] },
                   { $subtract: [1521105, 5] }
                ]
              }
            ]
          }
        ]
      },
      "normalized_rating": {
        $avg: [ "$imdb.rating", {
            $add: [
              1, { $multiply: [ 9, "$normalized_value" ] }
            ]
          }
        ]
      }
    }
  },
  { $sort: { "normalized_rating": 1 } },
  { $limit: 1 }
]).pretty()

{
 "_id" : ObjectId("573a13ccf29313caabd837cb"),
 "title" : "The Christmas Tree",
 "normalized_value" : 0.00017027151403589506,
 "normalized_scaled_value" : 1.001532443626323,
 "normalized_rating" : 1.1
}

// "title" : "The Christmas Tree" が答え

/*
2. $projectの箇所ではとても長いので、$addFieldsに切り出してみる
*/
db.movies.aggregate([
  {
    $match: { $and: [
      { languages: { $in: [ "English" ] } },
      { "imdb.rating": { $gte: 1 } },
      { "imdb.votes": { $gte: 1 } },
      { released: { $gte: ISODate("1990-01-01") } },
    ]}
  },
  {
    $project: {
      _id:1,
      title: 1,
      "imdb.votes": 1,
      "imdb.rating": 1,
      normalized_value: {
        $divide: [
          { $subtract: ["$imdb.votes", 5] },
          { $subtract: [1521105, 5] }
        ]
      }
    }
  },
  {
    $addFields: {
      "normalized_scaled_value": {
        $add: [
          1,
          { $multiply: [
              9,
              { $divide:
                [
                  { $subtract: ["$imdb.votes", 5] },
                   { $subtract: [1521105, 5] }
                ]
              }
            ]
          }
        ]
      },
      "normalized_rating": {
        $avg: [
          "$imdb.rating",
          {
            $add: [
              1, { $multiply: [ 9, "$normalized_value" ] }
            ]
          }
        ]
      }
    }
  },
  { $sort: { "normalized_rating": 1 } },
  { $limit: 1 }
]).pretty()

// 結果は同じ"The Christmas Tree"
{
 "_id" : ObjectId("573a13ccf29313caabd837cb"),
 "title" : "The Christmas Tree",
 "imdb" : {
  "rating" : 1.1,
  "votes" : 264
 },
 "normalized_value" : 0.00017027151403589506,
 "normalized_scaled_value" : 1.001532443626323,
 "normalized_rating" : 1.0507662218131615
}

今回は $addFields でも結果は同じになりましたが、$project のステージとの処理と違い注意すべき点は、$addFields でフィールドを使って演算したりする場合は、前のステージの結果の中に、そのフィールドが含まれている必要 があるということ。

以下の処理で、前の $project で対象にするフィールドに "imdb.votes" が含まれていないと、演算結果は null になってしまうので注意。

  {
    $addFields: {
      "normalized_scaled_value": {
        $add: [
          1,
          { $multiply: [
              9,
              { $divide:
                [
                  { $subtract: ["$imdb.votes", 5] },
                   { $subtract: [1521105, 5] }
                ]
              }
            ]
          }
        ]
      },
      "normalized_rating": {
        $avg: [
          "$imdb.rating",
          {
            $add: [
              1, { $multiply: [ 9, "$normalized_value" ] }
            ]
          }
        ]
      }
    }
  }

今回のメモ

aggregation() の書き方には少しずつ慣れてきましたが、複雑な問合せになると、時間がかかります。
また、リファレンスを見ながらの思考錯誤で大変！
上記の問題は正規化する処理を加えていたので、さらに苦手な内容でした。

さて、一通り解いたあとで、ふと思い出して MongoDB Compassを使ってみました。
このアプリケーション、最初のうちはいろんなタブがあっても何のことかわからなかったのですが、aggregationのコースを進めてみて、「ああ、これか！」という機能が。

MongoDB Compassには、aggregation用のタブがあって、ステージごとにクエリを書いて、どのようにデータが取り出されるかとか変換されるかを、プレビューを交えながら確認できます！

今回の結果も、うまく表示されました。
便利ですね！

次はChapter3、集合関数的なオペレータの操作になります。