🐈

# データサイエンス100本ノック（SQL）31~40

2021/11/30に公開

## 新しく出てきた内容

標準偏差[stddev_samp()]
パーセンテージ[percentile_cont() within group (order by )]

with句を使用した副問い合わせ（サブクエリ）の方法

COALESCEを使用した欠損値も含めた値の置換方法

S-031:レシート明細テーブル（receipt）に対し、店舗コード（store_cd）ごとに
売上金額（amount）の標本標準偏差を計算し、降順にTOP5を表示せよ。

``````PostgreSQL
select store_cd,
stddev_samp(amount) as std_amount
from receipt
group by store_cd
order by std_amount desc
limit 5;

BigQuery
select store_cd,
stddev_samp(amount) as std_amount
from [データセット名].receipt
group by store_cd
order by std_amount desc
limit 5;
``````

stddev_samp() = stddev()　どちらでも良い

S-032:レシート明細テーブル（receipt）に対し、
売上金額（amount）について25％刻みでパーセンタイル値を求めよ。

``````PostgreSQL
select percentile_cont(0.25) within group (order by amount) as per_25,
percentile_cont(0.5) within group (order by amount) as per_50,
percentile_cont(0.75) within group (order by amount) as per_75,
percentile_cont(1.0) within group (order by amount) as per_100
from receipt;

BigQuery
select distinct percentile_cont(amount, 0) over() as min_,
percentile_cont(amount, 0.25) over() as per_25,
percentile_cont(amount, 0.5) over() as median,
percentile_cont(amount, 0.75) over() as per_75,
percentile_cont(amount, 1) over() as max_,
from [データセット名].receipt;
``````

S-033:レシート明細テーブル（receipt）に対し、店舗コード（store_cd）ごとに
売上金額（amount）の平均を計算し、330以上のものを抽出せよ。

``````PostgreSQL
select store_cd,
avg(amount) as avg_amount
from receipt
group by store_cd
having avg(amount) >= 330;

BigQuery
select store_cd,
avg(amount) as avg_amount
from [データセット名].receipt
group by store_cd
having avg(amount) >= 330;
``````

S-034:レシート明細テーブル（receipt）に対し、顧客ID（customer_id）ごとに
売上金額（amount）を合計して全顧客の平均を求めよ。
ただし、顧客IDが"Z"から始まるのものは非会員を表すため、除外して計算すること。

``````PostgreSQL
with customer_amount as (
select customer_id,
sum(amount) as sum_amount
from receipt
where customer_id not like 'Z%'
group by customer_id
)
select avg(sum_amount)
from customer_amount;

BigQuery
with customer_amount as (
select customer_id,
sum(amount) as sum_amount
from [データセット名].receipt
where customer_id not like 'Z%'
group by customer_id
)
select avg(sum_amount) as avg_
from customer_amount;
``````

--with句を使ってサブクエリ（副問い合わせ）を作ると読みやすくて良い。

S-035:レシート明細テーブル（receipt）に対し、顧客ID（customer_id）ごとに
販売金額（amount）を合計して全顧客の平均を求め、
平均以上に買い物をしている顧客を抽出せよ。
ただし、顧客IDが"Z"から始まるのものは非会員を表すため、除外して計算すること。
なお、データは10件だけ表示させれば良い。

``````PostgreSQL
with customer_amount as (
select customer_id,
sum(amount) as sum_amount
from receipt
where customer_id not like 'Z%'
group by customer_id
)
select customer_id,
sum_amount
from customer_amount
where sum_amount >= (select avg(sum_amount) from customer_amount)
limit 10;

BigQuery
with customer_amount as (
select customer_id,
sum(amount) as sum_amount
from [データセット名].receipt
where customer_id not like 'Z%'
group by customer_id
)
select customer_id,
sum_amount
from customer_amount
where sum_amount >= (select avg(sum_amount) from customer_amount)
limit 10;
``````

S-036:レシート明細テーブル（receipt）と店舗テーブル（store）を内部結合し、
レシート明細テーブルの全項目と店舗テーブルの店舗名(store_name)を10件表示せよ。

``````PostgreSQL
select r.*,
s.store_name
from receipt as r
inner join store as s
on r.store_cd = s.store_cd
limit 10;

BigQuery
select r.*,
s.store_name
from [データセット名].receipt as r
inner join [データセット名].store as s
on r.store_cd = s.store_cd
limit 10;
``````

S-037:商品テーブル（product）とカテゴリテーブル（category）を内部結合し、
商品テーブルの全項目と
カテゴリテーブルの小区分名（category_small_name）を10件表示させよ。

``````PostgreSQL
select p.*,
c.category_small_name
from product as p
inner join category as c
on p.category_small_cd = c.category_small_cd
limit 10;

BigQuery
select p.*,
c.category_small_name
from [データセット名].product as p
inner join [データセット名].category as c
on p.category_small_cd = c.category_small_cd
limit 10;
``````

S-038:顧客テーブル（customer）とレシート明細テーブル（receipt）から、
各顧客ごとの売上金額合計を求めよ。
ただし、買い物の実績がない顧客については売上金額を0として表示させること。
また、顧客は性別コード（gender_cd）が女性（1）であるものを対象とし、
非会員（顧客IDが'Z'から始まるもの）は除外すること。
なお、結果は10件だけ表示させれば良い。

``````PostgreSQL
with customer_sum_amount as (
select customer_id,
sum(amount) as sum_amount
from receipt
group by customer_id
)
select cu.customer_id,
coalesce(csa.sum_amount, '0')
from customer as cu
left join customer_sum_amount as csa
on cu.customer_id = csa.customer_id
where cu.customer_id not like 'Z%' and cu.gender_cd = '1'
limit 10;

BigQuery
with customer_sum_amount as (
select customer_id,
sum(amount) as sum_amount
from [データセット名].receipt
group by customer_id
)
select cu.customer_id,
coalesce(csa.sum_amount, 0) as customer_sum_amount
from [データセット名].customer as cu
left join customer_sum_amount as csa
on cu.customer_id = csa.customer_id
where cu.customer_id not like 'Z%' and cu.gender_cd = '1'
limit 10;
``````

coalesce関数で欠損値を置換できる。

S-039:レシート明細テーブル（receipt）から売上日数の多い顧客の上位20件と、売上金額合計の多い顧客の上位20件を抽出し、完全外部結合せよ。
ただし、非会員（顧客IDが'Z'から始まるもの）は除外すること。

``````PostgreSQL
with customer_days as (
select customer_id,
count(distinct sales_ymd) as come_days
from receipt
where customer_id not like 'Z%'
group by customer_id
order by come_days desc
limit 20
),
customer_amount as (
select customer_id,
from receipt
where customer_id not like 'Z%'
group by customer_id
limit 20
)
select c_d.*,
from customer_days as c_d
full outer join customer_amount as c_a
on c_d.customer_id = c_a.customer_id;

BigQuery
with customer_days as (
select customer_id,
count(distinct sales_ymd) as come_days
from [データセット名].receipt
where customer_id not like 'Z%'
group by customer_id
order by come_days desc
limit 20
),
customer_amount as (
select customer_id,
from [データセット名].receipt
where customer_id not like 'Z%'
group by customer_id
limit 20
)
select c_d.*,
from customer_days as c_d
full outer join customer_amount as c_a
on c_d.customer_id = c_a.customer_id
order by c_d.come_days desc;
``````

S-040:全ての店舗と全ての商品を組み合わせると何件のデータとなるか調査したい。店舗（store）と商品（product）を直積した件数を計算せよ。

``````PostgreSQL
select count(*)
from store as s
cross join product as p;

BigQuery
select count(*) as count
from [データセット名].store as s
cross join [データセット名].product as p;
``````