🐰

ひとりMongoDB University 12/03 - インデクシングとアグリゲーション基本 (3)

2020/12/03に公開

この記録は、アドベントカレンダー形式の、MongoDB Universityの学習コースの記録、3日目になります！
引き続き、M001のコースです。12/15までに終わらせるのを目標にしています。
詳細はこちら。
のこりがあと7つなので、4つは進めよう！

アドベントカレンダー / ひとりMongoDB Universityの準備

Chapter 5: Indexing and Aggregation Pipeline

Introduction to Indexes

まずはIndexについての動画。
MongoDBでも、インデックスは存在。
実データを保持したコレクションに対して、特定のフィールドについて、各ドキュメントの位置と値を保持し、指定の条件で並び替えた（これも）「コレクション」として存在。

クエリを効率的にしてくれるよ！
複合インデックスもありだよ！

use sample_training

# この状態ではインデックスなし
db.trips.find({ "birth year": 1989 })

# 条件が複雑になると、まず全部の中から476のものを取り出し、それをソート
db.trips.find({ "start station id": 476 }).sort( { "birth year": 1 } )

インデックス生成前

# インデックス作成してみる
# フィールド名に対し、昇順（1）、降順（-1)を指定
db.trips.createIndex({ "birth year": 1 })
# 出力
{ createdCollectionAutomatically: false,
  numIndexesBefore: 1,    # 作成前は _id のインデックスのみ
  numIndexesAfter: 2,     # 追加された
  ok: 1,
  '$clusterTime':
   { clusterTime: { _bsontype: 'Timestamp', low_: 12, high_: 1606989182 },
     signature:
      { hash:
         { _bsontype: 'Binary',
           sub_type: 0,
           position: 20,
           buffer: <Buffer 4c b2 8e 8c b8 62 3f 3a 6e e0 5e b1 a5 ce 74 72 fb 73 b7 11> },
        keyId: { _bsontype: 'Long', low_: 3, high_: 1599325752 } } },
  operationTime: Timestamp("6901965981715791884") }

追加後のMongoDB Compassでの確認。


# もう一個追加
db.trips.createIndex({ "start station id": 476, "birth year": 1 })

# ためしに同じコマンドをもう一回実行すると、「もうできてるよ！」と言われる
db.trips.createIndex({ "start station id": 476, "birth year": 1 })
# 出力
{ numIndexesBefore: 3,
  numIndexesAfter: 3,
  note: 'all indexes already exist', # もうできてるよ！
  ok: 1,
  '$clusterTime':
   { clusterTime: { _bsontype: 'Timestamp', low_: 11, high_: 1606989477 },
     signature:
      { hash:
         { _bsontype: 'Binary',
           sub_type: 0,
           position: 20,
           buffer: <Buffer 31 6d 71 42 f4 f5 66 83 68 1f 77 4d e0 6a aa 07 01 dc 81 a5> },
        keyId: { _bsontype: 'Long', low_: 3, high_: 1599325752 } } },
  operationTime: Timestamp("6901967248731144203") }

Quiz: Introduction to Indexes (練習問題)

問題

Jameela often queries the sample_training.routes collection by the src_airport field like this:

db.routes.find({ "src_airport": "MUC" }).pretty()

Which command will create an index that will support this query?
ジミーはよくsample_training.routesのデータを検索します。
どのコマンドが、クエリをサポートするインデックスを作るのに合っているでしょう？

答え

use sample_training
# src_airportに対してのインデックスかな？
db.routes.createIndex({ "src_airport": 1 })

# 結果: OK!
{ createdCollectionAutomatically: false,
  numIndexesBefore: 1,
  numIndexesAfter: 2,
  ok: 1,
  '$clusterTime':
   { clusterTime: { _bsontype: 'Timestamp', low_: 1, high_: 1606989823 },
     signature:
      { hash:
         { _bsontype: 'Binary',
           sub_type: 0,
           position: 20,
           buffer: <Buffer a1 73 f2 26 56 5b fd 90 f8 b0 a3 67 3c d3 97 f4 98 9d 25 e3> },
        keyId: { _bsontype: 'Long', low_: 3, high_: 1599325752 } } },
  operationTime: Timestamp("6901968734789828609") }

Introduction to Data Modeling

データモデリングについて。

MongoDBは、デフォルトではデータ構造については特に強制や制約がない。（スキーマレス）

ただしアプリケーションのパフォーマンスや効率を考慮して、データを構造化して登録するのは重要
「データモデリング」で、データを構造化することで、検索効率も向上するよ！
大事なルール
- データはその利用用途に応じて格納されること
- この考え方によって、ドキュメントにどんなデータを保持するか、さらにはコレクションをどれだけ用意するかが決ってくる
Exp. 患者のデータ
- 複数の電話番号、外来履歴、処方箋、アレルギー、薬の副作用などなど
- 初めての外来の場合もある
- 必須項目と、そうでないものがある
  - 年齢、性別、名前、診察券の番号と言ったもの
- 単一の病院だけでなく、メディカルネットーワークを利用してデータを共有する場合は、症状が重要になったりする
  - また、診察した医師は次の外来診察の日程や、処方箋の情報も必要
- つまり、なにをキー、基準としてデータを検索するか、という点からモデリングすることが重要
- アプリケーションとして、どういう単位でデータを抽出するかでも、サブドキュメントや配列を考える上で重要

こちらの情報も参考に！

https://docs.mongodb.com/manual/core/data-modeling-introduction/ (データモデリングのドキュメント)
https://www.mongodb.com/blog/post/building-with-patterns-a-summary (
Building with Patterns: A Summary)
- MongoDBのモデリングのデザインパターンについてのBlog

Quiz: Introduction to Data Modeling

Problem: What is data modeling?

こたえ：
a way to organize fields in a document to support your application performance and querying capabilities

Upsert - Update or Insert? (動画)

MongoDBはUpsert (更新するけど、該当するデータがなければ新規登録)をサポートしている
デフォルトでは { upsert: false }
- 追加のときは明示的にinsertを使う
upsertは更新または追加
- 既存のドキュメント内の配列に対して、データを追加（push) するときなどに有効


#
# コーディング例
# r = { sensor: 5, date: '2020-12-03' } といった記録するデータを持ったオブジェクト都する場合
#
db.iot.updateOne({ "sensor": r.sensor, "date": r.date,
  "valcount": { "$lt": 48 } },
  { "$push": { "readings": { "v": r.value, "t": r.time } },
  "$inc": { "valcount": 1, "total": r.value } },
{ "upsert": true })

rで検索しつつ更新する。

Quiz: Upsert

Problem:

How does the upsert option work?

こたえ：(正しいものは複数選択)

It is used with the update operator, and needs to have its value specified every time that the update operator is called.
- Updateオペレータと一緒に利用し、upsertを有効にするには明示的に trueを指定すること
By default upsert is set to false.
- デフォルトではfalseであること
When upsert is set to true and the query predicate returns an empty cursor, the update operation creates a new document using the directive from the query predicate and the update predicate.
- trueに設定した場合は、検索条件に一致するドキュメントがなかった場合は、その検索条件を満たす新しいドキュメントを追加する

本日の記録

Chapter5まで完了！

Chapter5まで

きょうのzenn

3日目の記事作成。

zenn-contents $ npx zenn new:article --slug 20201203-mongodb-univ
📄20201203-mongodb-univ.md created.

別途書いた記事をピックアップにしてもらった！ありがとうございます！

https://twitter.com/zenn_dev/status/1334457604177711111