#SystemDesignInterview

System Design Interview
https://www.amazon.co.jp/dp/B08CMF2CQF

Chapter 1: Scale From Zero To Millions Of Users
Chapter 2: Back-of-the-envelope Estimation
Chapter 3: A Framework For System Design Interviews
Chapter 4: Design A Rate Limiter
Chapter 5: Design Consistent Hashing
Chapter 6: Design A Key-value Store
Chapter 7: Design A Unique Id Generator In Distributed Systems
Chapter 8: Design A Url Shortener
Chapter 9: Design A Web Crawler
Chapter 10: Design A Notification System
Chapter 11: Design A News Feed System
Chapter 12: Design A Chat System
Chapter 13: Design A Search Autocomplete System
Chapter 14: Design Youtube
Chapter 15: Design Google Drive
Chapter 16: The Learning Continues

cf. https://github.com/donnemartin/system-design-primer/blob/master/README-ja.md

gkz

Chapter 1: Scale From Zero To Millions Of Users

A single servers to multiple ones

Without load balancers to with ones

Vertical/Horizontal scaling

Vertical scaling: increase CPU, RAM, DISK
Horizontal scaling: add servers

Master-Slave combination of databases

Master Master, Master Slave, Master multiple Slaves
If the slave goes down, another is replaced.
If the master, one of the slave is promoted to master.

availability reliability and scalability -> tradeoff

cf. Plan for tradeoffs: You can’t optimize all software quality attributes

cf. 信頼性と可用性は何が違う？

Cashe/CDN is a double-edged sword

geo replication

pub/sub
logging

celebrity problem

celebrity problemなんてものがあるらしい。インフルエンサーがサービスの設計に影響を与える。。

2010: "At any moment, Justin Bieber uses 3% of our infrastructure.

In Twitter’s early days, only one celebrity could tweet at a time

gkz

Chapter 4: Design A Rate Limiter

クライアントサイドとサーバーサイド。どちらに設けるか？という選択肢が出てきところが印象的。潜在的にサーバーサイドに置くものだと思っていた。理由としては、本書で言うところの malicious actors を始めとするクライアント側の挙動を制御できないのでは？と。クライアントサイドに置くケースないですよね、、？

cf. 様々なrate limitアルゴリズム - Carpe Diem

cf.Ruby：Shopify GraphQL Admin API の使い方

cf. GitHubがRedisを使用してレートリミットをスケールアップ

cf. Web API 設計のベストプラクティス - Azure Architecture Center | Microsoft Docs

これにかぎらず、クラウドベンダーは一般的な技術用語の解説ページをフックに自社のサービスの会員登録やフリートライアルのページへ誘導させるようにしているケースは散見される印象がある

Chapter 5: Design Consistent Hashing

Consistent Hashing （コンシステントハッシュ法）

Chapter 6: Design A Key-value Store

Data partition
Data replication
レプリカはデータセンターレベルで冗長化を

Consisrency
Inconsistency resolution
Handling failures
System architecture diagram
Write path
Read path

cf. 自動障害回復システム月読の話 - Cybozu Inside Out | サイボウズエンジニアのブログ

いつどのような障害が発生するか分からない分散環境で、確実にデータを伝達する方法は多くはありません。月読ではゴシップ・プロトコルという仕組みで、データを定期的に交換しあうことで一時的な通信障害があっても確実に情報を伝達させています。

cf. ゴシッププロトコル - MOXBOX
cf. ブラウザとNode.jsで動く1kBのキーバリューストレージライブラリを書いた | Web Scratch

gkz

Chapter 7: Design A Unique Id Generator In Distributed Systems

so difficult to generate ids with auto increments
cf. https://www.w3schools.com/sql/sql_autoincrement.asp

Auto-increment allows a unique number to be generated automatically when a new record is inserted into a table.

Interviewで挙がったIDの技術要件

IDs must be unique. (ユニーク、ただひとつであること)
IDs are numrical values only. (数値型のみ)
IDs fit into 64-bit. (64bitにおさまること)
- cf. Snowflake形式のIDを採用した場合の苦労ポイント - yoskhdia’s diary
- JavaScriptでは数値は内部的に倍精度浮動小数点型数値で保持されるため、整数として表現可能な範囲は64bit Longよりも小さいです。そのため例えば、Snowflake方式で 398419938554941447 と採番されたIDは、数値のままJavaScriptに渡されると 398419938554941400 のように下2桁が 0 に切り捨てられてしまいます。
- cf. TwitterのユーザーID数が32bit整数の範囲を超えた模様 | TeraDas
IDs are ordered by date.
Ability to generate over 10.000 unique IDs / second.　（秒速10,000のIDを発行できること）

How to generate IDs in distributed systems

auto_inrements(Multi-master replication, DBが提供する自動採番)
UUID
Snowflake/Katsubushi

auto_inrements(DBが提供する自動採番)の課題

Hard to scale up with multiple data centers(Only writer is responsible for generating IDs)
IDs do not go up with time across multiple svs
It does not scale well when a sv is added/removed.

UUID

Pros: Easy to scale because each Web SV is responsible for generating IDs.
Cons: 128 bit long, requirement is 64 bit.

Snowflake/Katsubushi

cf. Designing a URL Shortening service like TinyURL - Grokking the System Design Interview

gkz

Chapter 8: Design A Url Shortener

301 redirect v.s. 302 redirect
permanently v.s. temporarily
URL shortening 's example: Figure 8-3
hash function's requirements are as follows:

Each longURL must be hashed to one hashValue.
Each hashValwe can be mapped back to the longURL.
cf. 自社で使えるURL短縮サービスを低コストにサーバーレスで構築した話 - Retty Tech Blog
参考になるところ
- もともとのURLと短縮URLから成るデータモデル、テーブル構造について
- 短縮URLにどこまでユニーク性を求めるか？という観点で考える短縮URLの文字数とその構成要素（数字だけ？アルファベットも？）

well-known hash functions:

CRC32, MD5, SHA-1

cf. What’s the difference between md5(), crc32() and sha1() crypto on PHP? - Stack Overflow

How can we make it shorter?
~- hash colisions が起きなくなるまで（DBに問い合わせて、hash colisions が起きるか確認して、hash colisionが起きれば）再帰的にURLから一部の文字列をインプット値としてhash化~
URL全文をhash functionsにかけて、hash colisions が起きれば、あらかじめ決めたランダム文字列を付与してもういっかいhash functionsにかける、以下ループ。

gkz

Chapter 9: Design A Web Crawler

Web Crawlers' reuqirements

Scalabiliry
- 膨大な量をクローリングできるか
Robustness
- 悪質なリンクや壊れたリンクを踏んだときにハンドリングできるか
Politeness
- 必要以上にリクエストを送って負荷を与えない≒DOS攻撃をしないように配慮できているか
  - ダウンローダースレッドごとに個別にFIFOキューがある
    - Each downloader thread has a separate FIFO queue and only downloads URLs obtained from that queue.
Extensibiliry
- クローラーのリデザインをすることなく、クローリング対象のWebサービスの仕様変更に追随することができるか

クローラーの仕組み、コンポーネントは、Figure 9-4

cf. 11. 収集ロボット Heritrix | ウェブアーカイブのしくみ｜国立国会図書館インターネット資料収集保存事業

Heritrix で採用されている仕組みについてだが、本書と同じようなことが書かれている

cf. ウェブアーカイブを支える技術

ウェブアーカイブ 全般について扱っている。今後の課題として技術構成が刷新できていない点を挙げている。

gkz

Chapter 10: Design A Notification System

Chapter 11: Design A News Feed System

Chapter 12: Design A Chat System

High-level design より、チャットシステムは stateless services, stateful services, and third-parry integration の3つのコンポーネントを組み合わせたものだという。また、技術選定の点で肝になるのがデータベースの設計（RDB/NoSQL）であり、選定する際に考慮するべき点が the data rypes and read/write patterns

memo

『マイクロサービスパターン実践的システムデザインのためのコード解説 - インプレスブックス』
- 「13.5.4 配達サービスのインテグレーショングルーの設計」

ゼクシィ恋結びに実装　検索基盤と機械学習を併用した自動応答システムの紹介